@Ironholds, at a lower priority, could you do a vet of the "portal" data in https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryTrends.htm with the newer data sources we have? Don't need perfect data but would like to have a rough check on 1) global % of non-crawler traffic going to www.* from desktop vs. mobile, 2) share of traffic coming from outlier countries in ErikZ's report (i.e. is India really this high). Can you do this without other analytics support?
India is responsible for the plurality of the traffic; ErikZ's numbers are not off and are still valid. Meanwhile, the vast majority of people turn up without a valid referer - this could just indicate that our lack of standardised SSL is costing us around referer forwarding. Most traffic is not spider-driven; that traffic that is, comes (weirdly) from Brazil most of the time.
Approximating out to a month of data, the portal gets around 600m pageviews; this is about 3.3% of our global traffic under the existing definition (which we know is overcounting. So, the actual proportion is likely to be much higher). It's hard to approximate a mobile/desktop split because I'm not sure if www.wikipedia.org actually redirects to www.m.* - if it doesn't, this would explain a lot, because only 2,040 of the 1.1m pageviews were to the mobile varnishes.
Does this help? Do you have additional specific questions/comments/thoughts?
What do you mean by apples/apples % of pageviews? As in: can we
compare the traffic to www.wikipedia.org to global traffic to the
We can, but it's hard; the landing page works in an incredibly weird
and totally undocumented way. So, we know that you can search through
it. Do those search requests silently send requests to
*.wikipedia.org? For me, that search defaults to English. Does it
default to English for everyone? Just people in the US? Just people
with my language settings? It's hard to know what bleed-over there is
between www.* and the wikis.
You write: this is about 3.3% of our global traffic under the existing definition (which we know is overcounting. So, the actual proportion is likely to be much higher).
So, I'm assuming you're counting in your views of www.wikipedia.org only non-crawler traffic, correct? And when you say it's overcounting, you're including crawler traffic in "global traffic" since that's in the old definition? If so, what I'd like to know is what the www.wikipedia.org non-crawler traffic compares like against all non-crawler traffic on the wikipedia.org domain (i.e. including the language subdomains).