Page MenuHomePhabricator

Investigate www.wikipedia.org traffic %
Closed, ResolvedPublic

Description

@Ironholds, at a lower priority, could you do a vet of the "portal" data in https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryTrends.htm with the newer data sources we have? Don't need perfect data but would like to have a rough check on 1) global % of non-crawler traffic going to www.* from desktop vs. mobile, 2) share of traffic coming from outlier countries in ErikZ's report (i.e. is India really this high). Can you do this without other analytics support?

Event Timeline

Eloquence assigned this task to Ironholds.
Eloquence raised the priority of this task from to Needs Triage.
Eloquence updated the task description. (Show Details)
Eloquence added subscribers: Eloquence, Ironholds.

Given that I wrote the entire pageviews definition without much support (except from Otto and Qchris for the implementation) I've got this :).

Starting work on this now. Will add queries/code/graphics here as I get them.

This comment was removed by Ironholds.

Okay! So:

India is responsible for the plurality of the traffic; ErikZ's numbers are not off and are still valid. Meanwhile, the vast majority of people turn up without a valid referer - this could just indicate that our lack of standardised SSL is costing us around referer forwarding. Most traffic is not spider-driven; that traffic that is, comes (weirdly) from Brazil most of the time.

Approximating out to a month of data, the portal gets around 600m pageviews; this is about 3.3% of our global traffic under the existing definition (which we know is overcounting. So, the actual proportion is likely to be much higher). It's hard to approximate a mobile/desktop split because I'm not sure if www.wikipedia.org actually redirects to www.m.* - if it doesn't, this would explain a lot, because only 2,040 of the 1.1m pageviews were to the mobile varnishes.

Does this help? Do you have additional specific questions/comments/thoughts?

Thanks, Oliver! It doesn't redirect - it's responsive.

Can we estimate an apples/apples % of pageviews?

What do you mean by apples/apples % of pageviews? As in: can we
compare the traffic to www.wikipedia.org to global traffic to the
"actual" sites?

We can, but it's hard; the landing page works in an incredibly weird
and totally undocumented way. So, we know that you can search through
it. Do those search requests silently send requests to
*.wikipedia.org? For me, that search defaults to English. Does it
default to English for everyone? Just people in the US? Just people
with my language settings? It's hard to know what bleed-over there is
between www.* and the wikis.

You write: this is about 3.3% of our global traffic under the existing definition (which we know is overcounting. So, the actual proportion is likely to be much higher).

So, I'm assuming you're counting in your views of www.wikipedia.org only non-crawler traffic, correct? And when you say it's overcounting, you're including crawler traffic in "global traffic" since that's in the old definition? If so, what I'd like to know is what the www.wikipedia.org non-crawler traffic compares like against all non-crawler traffic on the wikipedia.org domain (i.e. including the language subdomains).

Query run; non-crawler traffic to www.(m.)?.wikipedia.org is equivalent to 4.24% of non-crawler traffic to *.wikimedia.org domains as a whole (excluding the base landing page)

That's great - thank you. Last question for now: What's that in absolute numbers per month?

Months are unevenly sized, of course, and this is just a week, but if we extrapolate out, call it 0.75bn?

I'll take it. Thanks!