When investigating something for T214998, I used referer LIKE '%google%' to crudely approximate Google referals (covering all google.com, www.google.com, country TLDs like google.es, google.nl, etc.). I then noticed that many of these were not categorised as referer_class='external (search engine)' but rather referer_class='unknown'.
krinkle@stat1011$ hive
SELECT COUNT(*) AS cnt, referer_class, referer FROM wmf.webrequest WHERE year = 2025 AND month = 1 AND day = 5 AND (is_pageview=true OR is_redirect_to_pageview=true) AND referer_class != 'external (search engine)' AND referer LIKE '%google%' GROUP BY referer_class, referer ORDER BY cnt DESC LIMIT 10;
Using yesterday (5 Jan 2025) as an example, this 24 hour window contains about 10 million pageviews and redirects-to-pageviews affected by this issue.
cnt referer_class referer 9895226 unknown www.google.com … … 6365 unknown google.com … 105 external https://blog.google/
Looking at the W3C spec for Referer-Policy, I don't see a way to get this behaviour through there. Looking at the IETF spec for Referer HTTP header this does not appear to be strictly valid, but is clearly common enough to care about.
I broke it down by user agent to find a likely source or cause.
SELECT COUNT(*) AS cnt, referer_class, referer, user_agent_map["browser_family"] AS browser FROM wmf.webrequest WHERE year = 2025 AND month = 1 AND day = 5 AND hour = 10 AND (is_pageview=true OR is_redirect_to_pageview=true) AND referer_class != "external (search engine)" AND referer LIKE '%google%' GROUP BY referer_class, referer, user_agent_map["browser_family"] ORDER BY cnt DESC LIMIT 10;
cnt referer_class referer browser 9888357 unknown www.google.com Firefox 6810 unknown www.google.com Chrome Mobile WebView … 1261 unknown google.com Chrome Mobile 1236 unknown google.com Samsung Internet 1234 unknown google.com Mobile Safari 1206 unknown google.com Chrome Mobile iOS 617 unknown google.com Chrome 579 unknown google.com Edge … 223 unknown google.com Firefox … 55 unknown www.google.com Chrome
I don't know how our search engine referal dashboard is built. I considered whether maybe it uses referer_data instead of referer_class, so I spot checked that on a few rows as well. Alas, no. I take it referer_class is probably derived from referer_data, but I wanted to spot check this, just in case.
SELECT referer_class, referer, referer_data FROM wmf.webrequest WHERE year = 2025 AND month = 1 AND day = 5 AND hour = 10 AND (is_pageview=true OR is_redirect_to_pageview=true) AND referer_class != "external (search engine)" AND referer='www.google.com' LIMIT 10;
referer_class referer referer_data
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
unknown www.google.com {"referer_class":"unknown","referer_name":"none"}
`It seems all browsers send a bit of it, but, Firefox disproportionally so.
This might be the effect of a privacy-related browser extension that perhaps Firefox users are more likely to have installed, and that various Firefox-based browsers may even have pre-installed, such as Tor Browser, Waterfox, and LibreWolf.
