Setting up a pipeline to source Historical Edit Data into hdfs, aggregate it and expose it externally and power with it the new Wikistats UI
Description
Event Timeline
Code name for the bigger task of data gathering: "data lake" : https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake
Please look at substasks to see our backlog regarding Wikistats 2.0 replacement.
We are working towards reconstructing edit history without having to depend on the dumps.
@Nuria @Milimetric: I apologize if this is a bad place for this feedback, but I couldn't think of a better one.
I had a meeting with @HaeB, @Tnegrin, and @mpopov yesterday where we were discussed our metrics reporting to the Board of Trustees. We felt that Wikistats 2.0 would be a very valuable addition to this process, as a general destination for movement-level metrics, and that it would be even better if it had the ability to display generally-useful data annotations. For example, it would be helpful to users to explain the big pageviews drop that occurred when we switched to the new pageviews infrastructure, or the big drop in new active editors when we turned on anonymous mobile editing, or the big drop in edits when we switched to Wikidata for interlanguage links.
Have you given any thought to supporting such annotations?
Have you given any thought to supporting such annotations?
Yes, Dashiki already supports annotations (it is been a while), see pageviews and look at bottom axis
https://analytics.wikimedia.org/dashboards/vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=Pageviews
Like any configuraion in dashiki this info is sourced from meta: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations
Have in mind that wikistats has a strong community focus, its existance much predates the foundation and, really, its main goal is to motivate our editor community (cc @Erik_Zachte)
I think for reporting data to the board the new incarnation of the reportcard might be a much bette venue.
You can see a design prototype for the new Wikistats here (much WIP): https://analytics-prototype.wmflabs.org/#/ , We will be requesting a second round of community feedback on this visual design this coming moth.
Ooh, that prototype is still not ready to be shared, it's still very much early days. That said, I have thoughts on annotations.
Dashiki primarily uses dygraphs for timeseries line graphs. Work on dygraphs has pretty much ground to a halt and it has very basic annotation support. That's why the annotations in dashiki kind of suck. We have two thoughts for making annotations better in wikistats 2.0:
- build simple line graphs ourselves from d3, because d3 is now modular and we don't have to take a 400kb hit every time we load a graph. This should make it mobile-friendly and let us finally have decent custom annotations.
- find a better way to store annotations than the rough wiki-page raw JSON way we've been doing it so far. I'm thinking either nicely rendering Config:Dashiki:Annotations: pages or adding an API where you can request annotations. I think @Nuria does not like this second approach so we still have a little architecture work there. But in general we all agree annotations should be easier to add/update/consume.
With those two improvements we want ultimately to put annotations in the hands of the analysts at the foundation. So while Wikistats is community-focused, the people working on this data, and the annotations will be us. So we definitely want to make that easier/better.
Finally, I think @ezachte is a better ping for Erik than @Erik_Zachte, right?
In this case, it seems like the board wants very similar information to any journalist or Metapedian: mainly global editing and traffic numbers, with a side of whatever else might be illuminating about the state of the movement (no doubt they also want information on what different teams at the WMF are doing, but that's not the kind of reporting @HaeB and I have been doing for them).
I was not aware that a new incarnation of the report card is planned; perhaps you could give me some details? But I don't see the argument for adding complexity by having two dashboards where one one would do.
This is very exciting! There's a fair amount of knowledge of metric fluctuations rattling around in my brain, and right now there's no good place to document it. It sounds like annotations on Wikistats 2.0 could be that place :D
I was not aware that a new incarnation of the report card is planned; perhaps you could give me some details?
See: https://phabricator.wikimedia.org/T130117
For our first stab we are just moving it to dashiki and reporting a few metrics, the current reportcard (which hasn't been updated in a while) will redirect to http://analytics.wikimedia.org/dashboards/report-card/ (or similar) . This part of the work we have been doing to completely deprecate limn, we started with the editor dashboards you are familiar with. Migrating the UI is easy as you know, most of the work is been dedicated to have a programatic way to retrieve pageviews older than 2015. All this work is organized this task: https://phabricator.wikimedia.org/T146308
@Neil_P._Quinn_WMF :
We reworked our annotations to be friendlier, they are visible now on reportcard, please take a look: https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-now
Annotations text: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations
Here is how to configure those: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki#Configuring_annotations
Closing as a parent task in favor of using project tags. Epic tasks can serve as parent tasks when needed to capture large feature work.