Page MenuHomePhabricator

Visualize page create events for all wikis
Closed, ResolvedPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Creating ticket so @kaldari 's team and Analytics can coordinate on this regard

@Nuria: You may want to remove a good bunch of the copied project tags?

Yeah, it's kind of a bug in Phabricator that it copies all parent tags on subtasks. I think that's causing a ton of unwanted pinging. It would be fine if there was an option like "copy parent tags" but leave them off by default.

@Nuria, @Ottomata: Is there an existing service that we can piggyback for this? Such as Kibana or Grafana? If not, any suggestions for a framework to use or a place to host it?

@kaldari : visualizing things for all wikis is not easy to do with neither graphana or kibana, for 2 metrics for say 200 wikis you need to be able to show 400 graphs. Dashiki's per project layout was designed by pau's for this purpose, see: https://analytics.wikimedia.org/dashboards/vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=Pageviews

To use it you need to set up your queries with reportupdater and have dashiki pull your data, the configuration for dashiki is done via wiki. Dashiki is just a client side ball, it will pull your data via http.

Some docs.
Reportupdater: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater,
Dashiki: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki

A full example for dashiki for some of the editing dashboards:

Queries
https://github.com/wikimedia/analytics-limn-ee-data/tree/master/ee-migration

Plots
https://edit-analysis.wmflabs.org/editor-engagement/#projects=enwiki/metrics=Daily%20Edits

Dashiki Config:
https://meta.wikimedia.org/wiki/Config:Dashiki:EditorEngagement

@Nuria: That link doesn't work for me. Are you suggesting I create that repo? Just want to make sure it wasn't a typo :)

@kaldari repo exists in gerrit, you can just clone it.

@Nuria: Oops, you're right. My bad. Didn't notice the "git clone" in front of it :)

I'm working on this and got ReportUpdater working locally. A couple of questions:

  1. Most of the examples I've seen create datasets with a single measurement, meaning there are multiple SQL queries instead of one returning multiple columns. Is this what I should also be setting up in order to facilitate having Dashiki visualize it later?
  2. This should visualize page creation events for all wikis, is there a straightforward way of building a wiki_dbs.txt file to feed to explode_by to do that?
  3. What measurements are we interested in? I'm thinking at least total number of page creations per day, but perhaps they should be more detailed? I have a few I'm interested in for the ACTRIAL project[1], mainly looking at number of main namespace pages that were not created by bots and that are not redirects, with some further segmentation based on user groups.

Footnotes:

  1. https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2017-08-21

What measurements are we interested in?

Whatever you need for ACTRIAL, plus some basic ones:

  • Page creations per day
  • Main namespace page creations per day
  • Article (non-redirect main namespace) creations per day

This should visualize page creation events for all wikis, is there a straightforward way of building a wiki_dbs.txt file to feed to explode_by to do that?

@Nuria, @Ottomata: Do you know how this is normally handled?

Most of the examples I've seen create datasets with a single measurement, meaning there are multiple SQL queries instead of one returning multiple columns. Is this what I should also be setting up in order to facilitate having Dashiki visualize it later?

Yeah, probably best to follow the example of other dashboards unless @Ottomata or someone else knows better.

@mforns has more reportupdater know how than me, I'll let him respond :)

@Nettrom @kaldari

Most of the examples I've seen create datasets with a single measurement, meaning there are multiple SQL queries instead of one returning multiple columns. Is this what I should also be setting up in order to facilitate having Dashiki visualize it later?

It depends on what you want to visualize. Here's a hopefully thorough documentation on how to use reportupdater and dashiki to generate dashboards: https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Dashboards
If you want to explode by wiki db, I guess you want to use the metrics-by-project Dashiki layout. If so, I think the report file must have 2 columns: date and measure. For nicer display of the chart, it's recommended that the measure column is named after the corresponding wiki db. You can do that by using the {{wiki_db}} placeholder in the query, like: SELECT ... AS {{wiki_db}} ....

This should visualize page creation events for all wikis, is there a straightforward way of building a wiki_dbs.txt file to feed to explode_by to do that?

Yes! Instead of passing a python list as the value of the explode_by, you can pass a string with the path of the file that contains the wiki dbs. Reportupdater will read the file and use its values to explode the report. Here's an example: https://github.com/wikimedia/analytics-limn-edit-data/tree/master/edit. Also, for more detail on reportupdater options, have a look at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater

Cheers!

@mforns : Thanks much for your help with this! I've set up the queries so they return two columns, with the second named after the wiki as you recommended. Also, thanks for the link to the tutorial, it's a lot easier to follow than the technical documentation ([[:wikitech:Analytics/Systems/Dashiki]], I'd be happy to add a link to the tutorial from that page if that's useful?).

I've reached the "Deploy reportupdater job" step in the tutorial, do you want me to create a separate branch in the repository for the configuration, or should I just add it to the master branch? Not sure what the conventions are, so I thought I'd ask first.

@kaldari : Thanks, downloaded and added!

@Nettrom
Cool!
Yes, feel free to add any links or details that you find interesting to the docs!
And yes, you can add a Gerrit change-set with your queries and config to the master branch of reportupdater_queries repository.
If you want, you can add me as a reviewer, and I'll look into it.
Cheers!

@mforns Patch submitted (linked below), and I added you as a reviewer. First time working with Gerrit, hopefully I got it mostly right! Happy to make changes as need be, fun to learn how to do this. Thanks again!

https://gerrit.wikimedia.org/r/373373

Change 374878 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] [WIP] Add reportupdater job to trigger page-creation metrics

https://gerrit.wikimedia.org/r/374878

@Nettrom
I merged the queries patch.
When the puppet patch is reviewed and merged, you should see your report files here:
https://analytics.wikimedia.org/datasets/periodic/reports/metrics/page-creation
It will take a couple hours to compute all queries after merging though.
Cheers!

Change 374878 merged by Ottomata:
[operations/puppet@production] [WIP] Add reportupdater job to trigger page-creation metrics

https://gerrit.wikimedia.org/r/374878

Hi @Nettrom

I found some problem with the path to the configuration file. It was outdated since migration from stat1003 to stat1006 machines. That is why the reports weren't generating.

I fixed it as part of T174706 and now everything seems to be fine. Reports are already being generated and will be gradually sync'ed to https://analytics.wikimedia.org/datasets/periodic/reports/metrics/page-creation soon.

Cheers!

Hi @mforns

Ah, I remember being confused by the configuration file path in the examples I looked at, but forgot to ask about what it should be. Thanks for figuring that out and updating it, and also for your help with reviewing the patch, much appreciated!

You can create dashboards in labs too, they do not need to be on the prod domain, see for example:

https://edit-analysis.wmflabs.org/multimedia-health/

@Nuria : I'm working on this now, got the metrics added to [[m:Dashiki:CategorizedMetrics]] without breaking anything, or so it seems. I do not have permissions to create [[m:Config:PageCreationDashboard]], but it appears I can edit existing dashboards. Could you (or someone else who has permissions, pinging @kaldari) create the config page for our dashboard so I can edit it? Feel free to create it with a different title if the one I suggested breaks conventions.

I'm an admin on Meta, but it seems I don't have permission to create Config pages either :P

@kaldari: this is odd, I am not aware of any permits requirements to edit these pages before. Maybe you as an admin can fix the issue? It seems that (for now) anyone should be able to create these pages cc @Milimetric in case I am missing something

From what I can tell after digging around a bit, the configuration of the Dashiki extension limits the creation of pages in the "Config" namespace to ones with titles starting with "Dashiki:" (refs [1,2]). Thus, I can create "Config:Dashiki:PageCreations", but not "Config:PageCreations", I suspect the latter is instead a pseudo page used by the JsonConfig extension.

I'll await confirmation from @Milimetric on how to move forward here.

Refs:
1: Dashiki config diff that happens to show the relevant JsonConfig line: https://gerrit.wikimedia.org/r/#/c/358384/1/extension.json
2: JsonConfig extension: https://www.mediawiki.org/wiki/Extension:JsonConfig
3: https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Dashboards#Write_the_configuration_for_your_dashboard

@Nettrom: According to the documentation, Config:Dashiki:{YourDashboardName} is the correct way to set it up.

Ah, I see! The tutorial isn't aligned with said documentation then. I'll update the tutorial and move forward.

Super thanks @Nettrom , ping us on irc if you need help with config , there are several examples , the config you want is similar to: https://meta.wikimedia.org/wiki/Config:Dashiki:VitalSigns

@Nuria: I've tested our dashboard locally here and everything seemed to be working just fine. How do we go about getting it deployed? In this specific project, having a VM on Labs isn't really an option.

Testing the dashboard locally on a fresh install of Ubuntu 16.04 was somewhat cumbersome, perhaps because I don't have experience doing node-based development. Could I add a couple of requirements to the tutorial? It seems the installation needs both bower and gulp to be installed and available from the command line. Also, lots (all?) of the symlinks in the repository are broken in a fresh installation, that might be because semantic-ui installs differently nowadays. Once I had the necessary tools installed and fixed the symlinks, the dashboard built without a hitch.

Could I add a couple of requirements to the tutorial?

Please do, thank you.

How do we go about getting it deployed?

We have a VM where you can deploy it, do you have any code changes or is just config?
You need to add your dashboard to the ones available on labs and we can help with deployment, see: https://github.com/wikimedia/analytics-dashiki/blob/master/config.yaml and instructions about that on README: https://github.com/wikimedia/analytics-dashiki/blob/master/README.md

If you add me to your codechange I can help CR and deploy

@Nuria : I added a short note to the tutorial about the requirements. Since I don't know npm very well, it's rather non-specific on how to get them installed. I'll make a mental note to look into nvm on a rainy day, as that might allow it to be more specific on how to go about doing this since I'll then know how to do this for both a global npm install as well as for a local one using nvm.

Our dashboard requires no changes to the code, it only references the configuration. I've submitted Change 377290 and added you as a reviewer. Will happily make any changes necessary to it, just let me know.

And thanks much for your help with this, appreciate it!

Change 377290 had a related patch set uploaded (by Nuria; owner: Nettrom):
[analytics/dashiki@master] Add page creation dashboard configuration

https://gerrit.wikimedia.org/r/377290

@Nettrom: It looks like all the metrics are working in the dashboard except for the last 3: "Daily Pages Created in the Main namespace (ns=0) by autopatrolled users", "Daily Pages Created in the Main namespace (ns=0) by autoconfirmed users", "Daily Pages Created in the Main namespace (ns=0) by non-autoconfirmed users". Otherwise, looks great!

@Nuria : Thanks for taking care of this! Sorry I didn't get around to updating the commit message as you requested, forgot to put that on my todo list.

@kaldari : I'll look into that and see if there's a way to fix it.

@kaldari : The three last metrics are only defined for English Wikipedia, partly because I saw them as ACTRIAL-specific. When it comes to the autopatrol right, those are also defined for different user groups depending on what wiki we're looking at, and I didn't see the benefit of figuring those out for the entire set of wikis.

Once you remove the other wikis from the list on the left, the graphs for those three metrics should show up.

If we want to have the confirmed/autoconfirmed graphs available for all wikis, we can change the ReportUpdater configuration. We could then maybe also update it to gather datasets for number of article creations by unregistered accounts?

kaldari claimed this task.
kaldari moved this task from Ready to Q1 2018-19 on the Community-Tech-Sprint board.

Change 377290 merged by Nuria:
[analytics/dashiki@master] Add page creation dashboard configuration

https://gerrit.wikimedia.org/r/377290

nshahquinn-wmf raised the priority of this task from Medium to Needs Triage.Mar 30 2018, 10:38 AM
nshahquinn-wmf moved this task from Backlog to Radar on the Contributors-Analysis board.