Page MenuHomePhabricator

Most visited domains (pageviews) across all Wikipedia/Wikimedia
Closed, ResolvedPublic

Description

Hi,

I am looking for this data: the top X (ten or fifteen) most visited domains (pageviews) from all devices (desktop/mobile) across all Wikipedia/Wikimedia domains. So that means across all editions of Wikipedia, Wikidata, Wikimedia, Wikisource ...

What I have tried: The closest I could find was https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_aggregate_project_access_agent_granularity_start_end using which I can generate this data per project and then combine it across projects to get the results I want. However, it seems to be missing a few domains, like when I query for blog.wikimedia.org (or phabricator.wikimedia.org), it tells me:

The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet. Please check https://wikimedia.org/api/rest_v1/?doc for more information.

Note that I am not looking for individual articles but just domains (and subdomains).

Please let me know if more information is required. Thank you your help.

Event Timeline

ssingh triaged this task as Medium priority.Feb 15 2019, 5:20 PM
ssingh created this task.

However, it seems to be missing a few domains, like when I query for blog.wikimedia.org (or phabricator.wikimedia.org)

As mentioned (admittedly somewhat obliquely) on the documentation page linked in my email, the pageview data is limited to "production sites", which currently does not include blog.wikimedia.org and phabricator.wikimedia.org. There is some traffic data for both domains in other places, but we can be pretty certain already that neither of them are in the top 15 domains by pageviews, so it's probably not worth retrieving numbers for these for this purpose.

Here is a first result: the top 15 by pageviews for January 2019, with known bots/spiders excluded. (To get the domain, combine project and access method - e.g. "it.wikipedia" "mobile web" means it.m.wikipedia.org, "en.wikipedia" "desktop" means en.wikipedia.org.)

projectaccess_methodviews
en.wikipediamobile web4455807733
en.wikipediadesktop3616017826
ja.wikipediamobile web732477575
es.wikipediamobile web619856511
de.wikipediadesktop540977617
de.wikipediamobile web514878297
ru.wikipediadesktop472275693
ru.wikipediamobile web446499662
fr.wikipediamobile web421427313
it.wikipediamobile web407834206
ja.wikipediadesktop391076475
fr.wikipediadesktop343466499
es.wikipediadesktop311016461
pt.wikipediamobile web202435769
it.wikipediadesktop201453206

Data via

SELECT project, access_method, SUM(view_count) AS views
FROM wmf.projectview_hourly
WHERE year = 2019 AND month = 1
AND agent_type = 'user'
GROUP BY project, access_method
ORDER BY views DESC LIMIT 15;

Thank you for this data! I think we can close this ticket or should we leave it open for automating it (which will be later)?

Yes, that should be a separate task (and may require involvement from other teams) .