Page MenuHomePhabricator

Setup Config:Dashiki:WMCSEdits on meta wiki
Closed, ResolvedPublic

Description

This would require the following steps:

Event Timeline

srishakatux created this task.

@Milimetric I looked into the supported layouts and visualization a bit https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Dashboards#Supported_layouts_and_visualizations. It seems to me that the most relevant layout for wmcs visualization would be the metrics-by-project layout like in https://analytics.wikimedia.org/dashboards/vital-signs or https://page-creation.wmflabs.org.

I can imagine that the two metrics to show would be wmcs_edits and total_edits but wmcs_edits will only be useful with annotations that show what % of total edits are wmcs edits.

I'm guessing that metrics-by-project might not work for us right now because the data we are reporting is not per wiki but for all wikis in a single file:

wikitotal_editswmcs_edits
abwiki7460
acewiki3633
adywiki1471
afwiki1245546
......

I've a few questions:

  • Is metrics-by-layout choice the right one for us?
  • Do we need to change the data format? Is it possible to achieve metrics-by-project layout without changing the data format?
  • Also, right now we are not reporting % of wmcs edits. Will that be a useful addition in our hive query for the annotations part or there might be some other way to calculate % of wmcs edits?

cc @JAllemandou

I've a few questions:

  • Is metrics-by-layout choice the right one for us?

Only you can make that decision. If you are going to look at different wikis often, then maybe it's good. If you're only going to look at the overall number often, and only occasionally look at each wiki, you could aggregate total, total_wmcs, and percent_wmcs and plot those overall, then point people to the data file to look at individual wikis (or plot that in a plain tabular chart)

  • Do we need to change the data format? Is it possible to achieve metrics-by-project layout without changing the data format?

Dashiki has simple data adapters that it can use to read different formats. It's not well documented but it's relatively easy to add a new one and configure your dashboard to use it instead of what it expects by default. For example we adapted it to read responses from AQS when that went live: https://github.com/wikimedia/analytics-dashiki/blob/master/src/app/converters/timeseries/aqs-api-response.js. In this case it's a bit tricky because it expects each wiki in a different file so we'd have to make some changes in the layout too. Changing the data format is easy, but we'd have to change the way it outputs so it goes to multiple files. I'm not sure there's an easy way to do that either.

  • Also, right now we are not reporting % of wmcs edits. Will that be a useful addition in our hive query for the annotations part or there might be some other way to calculate % of wmcs edits?

I'm not sure what you mean by annotations, but I would imagine % wmcs edits would be more interesting than the totals. It would make it easier to compare wikis if you're plotting multiple wikis on the same chart.

Some ideas:

  • data would need to look like:
2015-01-01 frwiki 0.56
2015-01-01 enwiki 0.67
2015-01-01 frwiki 0.55
2015-02-01 enwiki 0.78

And percentages below a threshold should probably be omitted. Say anything below 0.01 (1%) is omitted

This is how browser data looks like now:

date	os_family	os_major	percent
2015-06-07	Windows 7	-	0.2574727188514971
2015-06-07	Other	-	0.17613564907044232
2015-06-07	iOS	8	0.12724360664879844
2015-06-07	Android	4	0.1218471036734872
2015-06-07	Windows 8.1	-	0.08202389075599967
2015-06-07	Mac OS X	10	0.052575889665423016
2015-06-07	Windows XP	-	0.05113476001062913
2015-06-07	Android	5	0.030836542347413443
2015-06-07	iOS	7	0.025917465695526058

Thank you both for your helpful reply! We discussed the two possible layouts in our meeting and agreed that:

  • Ratios/percentages are what we would like to visualize, particularly the % of edits across all wikis monthly.
  • If not all for all wikis, definitely for wikidata, enwiki, dewiki, commons and others that we see as indicators of general trends (to begin with).

So, yes, I will be looking at browser dashboard today, will set up a similar config file on wiki and play around with fake data. Later, can upload a patch to update the hive query to output additional % column in data. Will follow-up soon with more questions :)

Update > I have the config here https://meta.wikimedia.org/wiki/Config:Dashiki:WMCSEdits, and I can use it to generate a dashboard with the layout of type tabs and some not-so-accurate/old data locally.
Out of three views (hierarchical, dygraphs-timeseries, and tabular), I am only able to get the tabular one working as of now. For dygraphs-timeseries, I am guessing that the data format needs to be something like in all_sites_by_os_family_percent.tsv here https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/browser/ (wikis in columns and not in rows)? Is that right, or is there any workaround for this?

I guess that the hierarchical view uses 35 days old data, that is why I'm having some trouble with this view. I can come to this view later as getting the time-series right would be our priority. But, I am guessing that for both hierarchy and dygraphs-timeseries visualization, we can do some tweaks on the dashiki side to ignore the data with % of edits less than 1. Would you advise to do the tweaks on the dashiki side or generate separate reports for each view?

(wikis in columns and not in rows)? Is that right, or is there any workaround for this?

That's correct, expectation is <time>, <label>, <value> so each value is in 1 row so for a tab layout the format of the file needs to change.

in your case

2019-01-01 all-wikis 0.02
2019-01-01 enwiki 0.4
2019-01-07 frwiki 0.67

To easily do a tryout, drop a file with this format on:

@stat1006:/srv/published-datasets/periodic/reports/sri-test/test.csv

It will be visible at :

https://analytics.wikimedia.org/published/datasets/periodic/reports/sri-test/test.csv

and you can run your local dashboard from these remote files (by changing your config) . This way you would see how things would look w/o doing any code changes. You can add fake data for any period of time and see the other graphs or reformat some real data that @bd808 has somewhere. You can even deploy your dashboard to labs (no commit needed) and send links around for people to test it out with "real data" but you have not yet finalized the data gathering scripts.

This was very helpful! I put three different files each for a separate view here https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/sri-test/, and I'm happy with the dashboard dashiki gives with this data: For example, the timeseries view looks like:

Screen Shot 2019-11-02 at 12.01.05 AM.png (1×2 px, 366 KB)

(data I've used is real; what we've gathered via Bryan's script in the past few months plus some spreadsheet calculations for percent, etc.).

I think I am ready to deploy the dashboard to staging instance for my team folks to look at and test. I might ask some questions on IRC on Monday on gaining access to dashiki labs project, etc.

@srishakatux what's left to do here? Need any support updating the config.yaml for example?

srishakatux closed this task as Resolved.EditedNov 20 2019, 7:12 AM

(thanks @Milimetric for the ping on this! Config file is here https://meta.wikimedia.org/wiki/Config:Dashiki:WMCSEdits, though it is using test files for now but that can be changed later )