Page MenuHomePhabricator

Pull map stats to create a baseline BEFORE rapid growth of usage on Wikipedias
Closed, ResolvedPublic

Description

We need to get updated stats for map usage so we can show the growth that is bound to happen as we release mapframe to 277 Wikipedias. We've already released to English, so we need to do this soonest.

I'm thinking in terms of an update to the stats on this page, by @mpopov —maybe with a few additions.

Stats we most need updated

Mapframe

  • How many articles with mapframe per wiki
  • Total mapframes per wiki (because some articles have multiple mapframes)
  • Mapframe prevalence per wiki (% of total articles that have mapframe)
  • Total articles on the wiki.

Maplink

  • [same as above]

New stats it would be nice to have

  • mapframe pageviews --number of views to mapframe pages per [week/day/month?]
  • maplink clicks -- number of times people click on maplink links per [week/day/month?]
  • If possible, break the above down per wiki.

Specs

At this point, we really just need the numbers in a spreadsheet, but if there is an easy way to automate this to a dashboard, then please do.

Event Timeline

jmatazzoni triaged this task as High priority.May 3 2018, 12:44 AM
jmatazzoni created this task.
mpopov added a comment.May 3 2018, 9:49 PM

Sooooo…most of this has actually already been done. We have per-wiki daily stats beginning on 2017-09-14 over at:

These are generated by https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/modules/metrics/maps/prevalence.R

(There might be some wikis missing that have had mapframes enabled in the past few months.)

And last year I was working on a way to visualize that data but then a whole bunch of things happened and I've had to put T170022 on a back-back-back-burner: http://discovery-beta.wmflabs.org/maps/#kartographer_prevalence

And http://discovery-beta.wmflabs.org/maps/#kartographer_langproj

(Apologies ahead of time for any bugs and general instability.)

As for client-side interactions with maplinks & mapframes there is event logging that Julien was working on before he left the org (see T151929). The schema is Kartographer https://meta.wikimedia.org/wiki/Schema:Kartographer and the instrumentation for it is over at:

So that data exists in EL database and just needs to be extracted.

That looks great! I think I've found one bug: Swedish Wikipedia is shown with 0% prevalence even though there are maps there and there are rows for svwiki in the TSV files with non-zero files. I'll try to figure out why that is, but having this to start from will save us a lot of time!

jmatazzoni added a comment.EditedMay 3 2018, 11:11 PM

Thanks @mpopov! As you note, not all current mapframe wikipedias are accounted for in your first stats page (missing are Arabic, Bulgarian, Czech, Spanish, Kannada, Latvian, Portuguese, English).

The second stats page has a list of some 23 languages, the rationale for which I'm not sure I understand. Also, it looks like some measurements here are missing. E.g., English is shown as having no maplinks, which I'm pretty sure is not right.

But here is the bigger issue: we are about to release mapframe to 277 more Wikipedias—essentially all wikipedias except nine flagged revision wikis. We need to be able to track usage on these as well. What do you suggest? How should they be added in?

Will your stats page be able to scale up to measure hundreds more? Is the general "Wikipedia" figure already accounting for all wikipedias programmatically, or does it just add up the 11 you list on the page? What about the spreadsheet: what will happen if we start loading hundreds of wikis? Should we pick some representative wikis we want to measure?

Thanks for your prompt advice and help.

mpopov added a comment.May 4 2018, 7:00 PM

Thanks @mpopov! As you note, not all current mapframe wikipedias are accounted for in your first stats page (missing are Arabic, Bulgarian, Czech, Spanish, Kannada, Latvian, Portuguese, English).

But here is the bigger issue: we are about to release mapframe to 277 more Wikipedias—essentially all wikipedias except nine flagged revision wikis. We need to be able to track usage on these as well. What do you suggest? How should they be added in?

Will your stats page be able to scale up to measure hundreds more? Is the general "Wikipedia" figure already accounting for all wikipedias programmatically, or does it just add up the 11 you list on the page? What about the spreadsheet: what will happen if we start loading hundreds of wikis? Should we pick some representative wikis we want to measure?

I suggest submitting a patch (via Gerrit) that updates prevalence.yaml in wikimedia/discovery/golden (which has lists of database names).

I think it will scale fine. Most of the wikis' prevalence is very quick to calculate because of the relatively small volume of pages and the calculation is done server-side in MariaDB. Bigger wikis take longer with enwiki being the slowest.

Change 431050 had a related patch set uploaded (by Catrope; owner: Catrope):
[wikimedia/discovery/golden@master] Update list of Wikipedias with mapframe

https://gerrit.wikimedia.org/r/431050

Change 431051 had a related patch set uploaded (by Catrope; owner: Catrope):
[wikimedia/discovery/golden@master] Update list of maplink Wikipedias

https://gerrit.wikimedia.org/r/431051

Change 431052 had a related patch set uploaded (by Catrope; owner: Catrope):
[wikimedia/discovery/golden@master] Fix treatment of Wikivoyages

https://gerrit.wikimedia.org/r/431052

Change 431050 merged by Jforrester:
[wikimedia/discovery/golden@master] Update list of Wikipedias with mapframe

https://gerrit.wikimedia.org/r/431050

Change 431051 merged by Bearloga:
[wikimedia/discovery/golden@master] Update list of maplink Wikipedias

https://gerrit.wikimedia.org/r/431051

Change 431052 merged by Bearloga:
[wikimedia/discovery/golden@master] Fix treatment of Wikivoyages

https://gerrit.wikimedia.org/r/431052

Change 432319 had a related patch set uploaded (by Catrope; owner: Catrope):
[wikimedia/discovery/golden@master] Get mapframe stats for all Wikipedias that recently got mapframe

https://gerrit.wikimedia.org/r/432319

Change 432319 merged by Bearloga:
[wikimedia/discovery/golden@master] Get mapframe stats for all Wikipedias that recently got mapframe

https://gerrit.wikimedia.org/r/432319

Etonkovidova closed this task as Resolved.May 29 2018, 9:02 PM
Etonkovidova claimed this task.
Etonkovidova added a subscriber: Etonkovidova.
Vvjjkkii renamed this task from Pull map stats to create a baseline BEFORE rapid growth of usage on Wikipedias to xqdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Etonkovidova as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from xqdaaaaaaa to Pull map stats to create a baseline BEFORE rapid growth of usage on Wikipedias.Jul 2 2018, 6:31 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to Etonkovidova.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.