In June 2022, Google launched a new Private prefetch proxy feature in Chrome Mobile. We believe that this has led to an increase in automated traffic in Readers data ( see T341848#9014932).
We want to find a way to be able to identify (and then decide whether to remove) this traffic by labeling it as such in webrequest and other derived pageview tables so that we can answer a few questions -
- are all the private prefetch pageviews correctly labeled as automated traffic ?
- we observed similar but slightly smaller increases in user traffic. is some of the prefetch data being mislabeled as user traffic?
- maybe both are happening at the same time?
This is possible in part by checking the Sec-Purpose: prefetch; anonymous-client-ip request header. (T341848#9143300)
There are similar headers from other browsers, although not necessarily in the context of an intervening IP address-shielding proxy service (however, for measurement purposes on standard web logs, may yield some similar behavior). To allow for greater coverage of probable headers, patch activity starting from January 2024 is going to try to catch more of this.
To learn more about some popular browsers, start from the historical article at https://lionralfs.dev/blog/exploring-the-usage-of-prefetch-headers/ , then notice the following as well:
- The Purpose: prefetch header is used in Safari / WebKit, at least in some contexts. It seems probable that this will be updated eventually, although it may be necessary to file a bug or check on WebKit's Slack or something similar in order to prompt for clarity about standardization. Note that this isn't to be confused with iCloud Private Relay (e.g., exit IPs as mentioned in an iCloud Private Relay developer-focused page); it is possible that private Safari builds take under consideration certain prefetch behavior, but it would require extended observation and there are diminishing returns in looking further than exit IPs and UA.
- Firefox has transitioned in the prior year from X-Moz: prefetch to the standardized header with its singular token Sec-Purpose: prefetch according to recently merged code and MDN.
- Chrome / Chromium (also the engine for newer Edge) has a couple apparently canonical values (Sec-Purpose: prefetch and Sec-Purpose: prefetch;anonymous-client-ip; the latter being the thing that spurred this task) and another possibly employed omnibox incantation (Sec-Purpose: prefetch;prerender per code and sparse-to-empty search engine results, but also much more visible https://developer.chrome.com/docs/web-platform/prerender-pages). The Private prefetch proxy article mentions a header of Sec-Purpose: Prefetch; anonymous-client-ip, so we'll try to handle that in case there's some magic in the way Chrome is configured to talk to Google services like Google search...but we'll also look expressly for the lowercased value without any spaces.