For the mobile web version of Wikipedia, based on last-access cookies, possibly split by OS
Description
Related Objects
- Mentioned Here
- D7: Testing: DO not merge
Event Timeline
Didn't yet get to wrap this up today and do a fuller writeup, but the answer for the English Wikipedia (en.m.wikipedia.org) is about 8-10 million per day. For Spanish, 1-2 million. Compare to e.g. the 1.1 million daily active users of the Android app.
(Starting next week or so we should also be able to get a global number - across all language Wikipedia - from the new last-access cookie.)
Below is the same metric (number of mobile devices already seen yesterday) for the 20 most (re)-visited mobile web versions of Wikipedia, averaged over the four weeks from June 3-30:
wikipedia_version avg_visited_yesterday en.m.wikipedia.org 8859519 ja.m.wikipedia.org 1924954 es.m.wikipedia.org 1542810 de.m.wikipedia.org 1093970 ru.m.wikipedia.org 884864 it.m.wikipedia.org 849191 fr.m.wikipedia.org 808798 zh.m.wikipedia.org 391060 pt.m.wikipedia.org 372229 pl.m.wikipedia.org 254385 ar.m.wikipedia.org 245965 fa.m.wikipedia.org 193206 nl.m.wikipedia.org 176551 sv.m.wikipedia.org 144317 id.m.wikipedia.org 123807 fi.m.wikipedia.org 76474 th.m.wikipedia.org 75938 he.m.wikipedia.org 72021 ko.m.wikipedia.org 60220 vi.m.wikipedia.org 51116 20 rows selected (310.205 seconds) SELECT CONCAT(project,'.m.wikipedia.org') AS wikipedia_version, INT(AVG(visited_yesterday)) AS avg_visited_yesterday FROM ( SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date, project, SUM(IF((unix_timestamp(CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')), 'yyyy-MM-dd') - unix_timestamp(wmf_last_access, 'dd-MMM-yyyy')) = 86400,1,0) * view_count) AS visited_yesterday FROM tbayer.webrequest_extract_bak WHERE unix_timestamp(wmf_last_access, 'dd-MMM-yyyy') IS NOT NULL AND year = 2017 AND month = 6 AND day >= 3 -- 4x7 days, to account for weekly seasonality AND access_method = 'mobile web' AND project_class = 'wikipedia' GROUP BY year, month, day, project) AS byday GROUP BY project ORDER BY avg_visited_yesterday DESC LIMIT 20;
Per our meeting, for this same presentation I'm also looking for:
- Pageviews by platform (back to begin of 2016)
- D7 Retention by platform (back as far as the data exists)
- Wikidata edits and revert rates (since launch)
Done (with only the overall revert rate, I think we don't need all the available detail)
And here is the same data (daily average of been-here-yesterdays for June 3-30), with iOS and Android clients broken out:
wikipedia_version | all_yesterdays | ios_yesterdays | android_yesterdays |
---|---|---|---|
en.m.wikipedia.org | 8859519 | 4794681 | 3859229 |
ja.m.wikipedia.org | 1924954 | 1107981 | 804330 |
es.m.wikipedia.org | 1542810 | 323953 | 1185449 |
de.m.wikipedia.org | 1093970 | 479643 | 591720 |
ru.m.wikipedia.org | 884864 | 292457 | 566299 |
it.m.wikipedia.org | 849191 | 302864 | 515582 |
fr.m.wikipedia.org | 808798 | 371527 | 411050 |
zh.m.wikipedia.org | 391060 | 185535 | 204657 |
pt.m.wikipedia.org | 372229 | 77633 | 282021 |
pl.m.wikipedia.org | 254385 | 47918 | 194028 |
ar.m.wikipedia.org | 245965 | 92152 | 147949 |
fa.m.wikipedia.org | 193206 | 32721 | 158611 |
nl.m.wikipedia.org | 176551 | 98111 | 76033 |
sv.m.wikipedia.org | 144317 | 90608 | 52857 |
id.m.wikipedia.org | 123807 | 10281 | 97996 |
fi.m.wikipedia.org | 76474 | 23095 | 49188 |
th.m.wikipedia.org | 75938 | 24609 | 50933 |
he.m.wikipedia.org | 72021 | 21672 | 50114 |
ko.m.wikipedia.org | 60220 | 13584 | 46574 |
vi.m.wikipedia.org | 51116 | 17937 | 29189 |
Data source is this (slightly clumsy) query: SELECT CONCAT(all_avgs.project,'.m.wikipedia.org') AS wikipedia_version, all_avgs.avg_visited_yesterday AS all_yesterdays, ios_avgs.avg_visited_yesterday_ios AS ios_yesterdays, android_avgs.avg_visited_yesterday_android AS android_yesterdays FROM ( SELECT project, os_family, INT(AVG(visited_yesterday)) AS avg_visited_yesterday_ios FROM ( SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date, project, SUM(IF((unix_timestamp(CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')), 'yyyy-MM-dd') - unix_timestamp(wmf_last_access, 'dd-MMM-yyyy')) = 86400,1,0) * view_count) AS visited_yesterday, user_agent_map['os_family'] AS os_family FROM tbayer.webrequest_extract_bak WHERE unix_timestamp(wmf_last_access, 'dd-MMM-yyyy') IS NOT NULL AND year = 2017 AND month = 6 AND day >= 3 -- 4x7 days, to account for weekly seasonality AND access_method = 'mobile web' AND project_class = 'wikipedia' GROUP BY year, month, day, project, user_agent_map['os_family'] HAVING user_agent_map['os_family'] = 'iOS') AS iosbyday GROUP BY project, os_family ) AS ios_avgs JOIN ( SELECT project, os_family, INT(AVG(visited_yesterday)) AS avg_visited_yesterday_android FROM ( SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date, project, SUM(IF((unix_timestamp(CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')), 'yyyy-MM-dd') - unix_timestamp(wmf_last_access, 'dd-MMM-yyyy')) = 86400,1,0) * view_count) AS visited_yesterday, user_agent_map['os_family'] AS os_family FROM tbayer.webrequest_extract_bak WHERE unix_timestamp(wmf_last_access, 'dd-MMM-yyyy') IS NOT NULL AND year = 2017 AND month = 6 AND day >= 3 -- 4x7 days, to account for weekly seasonality AND access_method = 'mobile web' AND project_class = 'wikipedia' GROUP BY year, month, day, project, user_agent_map['os_family'] HAVING user_agent_map['os_family'] = 'Android') AS androidbyday GROUP BY project, os_family ) AS android_avgs ON ios_avgs.project = android_avgs.project JOIN ( SELECT project, INT(AVG(visited_yesterday)) AS avg_visited_yesterday FROM ( SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date, project, SUM(IF((unix_timestamp(CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')), 'yyyy-MM-dd') - unix_timestamp(wmf_last_access, 'dd-MMM-yyyy')) = 86400,1,0) * view_count) AS visited_yesterday FROM tbayer.webrequest_extract_bak WHERE unix_timestamp(wmf_last_access, 'dd-MMM-yyyy') IS NOT NULL AND year = 2017 AND month = 6 AND day >= 3 -- 4x7 days, to account for weekly seasonality AND access_method = 'mobile web' AND project_class = 'wikipedia' GROUP BY year, month, day, project) AS allbyday GROUP BY project ) AS all_avgs ON android_avgs.project = all_avgs.project ORDER BY all_yesterdays DESC LIMIT 20;
As discussed, I also started to look into generating a global version based on the new global Last-access cookie, but it seems this needs some extra care to make sure the interpretation is valid (and also, to take a similar average over several weeks, a potentially more time-consuming webrequest query that could easily take a day to run), so this won't be ready today, sorry.
Closing this now, as the main requests were all done back in July for the occasion at which they were needed, I haven't had time to tackle the bonus task (global numbers yet) since; it would be more complicated and probably justify a separate task,