Page MenuHomePhabricator

Create dashboard showing MediaWiki tarball download statistics
Closed, DeclinedPublic

Description

From #mediawiki just now:

[10:35:37] <legoktm> https://grafana.wikimedia.org/dashboard/db/extension-distributor-downloads?var-release=REL1_26 
[10:36:47] <ostriches> legoktm: We need something similar for MW downloads.
[10:37:01] <legoktm> from releases.wm.o?
[10:37:24] <legoktm> try asking addshore nicely to work his magic ;)
[10:37:36] <addshore> *waves*
[10:38:37] <addshore> looks like that would have to come from the web request logs though :P
[10:39:26] <ostriches> legoktm: bleh Comcast died. Yes exactly.
[10:40:02] <addshore> ostriches: well, if it is behind varnish, then the request stuff should be in hadoop!
[10:40:49] <ostriches> Yeah i think its just a matter of getting at the data and massaging it into something meaningful and pretty
[10:40:54] <ostriches> The data is there for sure
[10:41:19] <legoktm> is there already a way to go from hadoop -> graphite?
[10:41:55] <addshore> legoktm, well of course, but its not too nice, I would guess you run a query in a script, then take the data and send it to graphite
[10:42:12] <addshore> not sure if there is something more built in, such as catch data as it comes in and send it to graphite, but there might be!

Basically we want a dashboard that shows the download counts of tarballs per version. The data should be in hadoop, we just need to get it to graphite? somehow.

Event Timeline

Legoktm raised the priority of this task from to Needs Triage.
Legoktm updated the task description. (Show Details)
Legoktm added a project: MediaWiki-Releasing.
Legoktm added subscribers: Legoktm, demon, Addshore.
Legoktm set Security to None.

It is not clear what is the value of this data, can someone explain?

To get an idea of how many people are using the MediaWiki tarballs. Additionally, this data could be used to see if people are still downloading old versions, how fast people are moving over to downloading new versions, etc.

It is not clear what is the value of this data, can someone explain?

It would aid MediaWiki development prioritisation and communication.

demon triaged this task as Medium priority.Jan 12 2018, 10:58 PM

If the server that fronts mediawiki downloads is backed up by varnish (is it?) this data exists in hadoop most likely. Can @Legoktm answer this question?

Looking at the response headers dumps.wikimedia.org is not behind varnish

image.png (258×484 px, 25 KB)

If the server that fronts mediawiki downloads is backed up by varnish (is it?) this data exists in hadoop most likely. Can @Legoktm answer this question?

Yes, releases.wikimedia.org is behind varnish.

Looking at the response headers dumps.wikimedia.org is not behind varnish

Wrong domain :-)

Looking at the response headers dumps.wikimedia.org is not behind varnish

Wrong domain :-)

*goes to have a coffee*

Nuria moved this task from Datasets to Incoming on the Analytics board.

It is in Hadoop, we just need to surface the data somehow.

My advice , rather than using hadoop for this would be to instrument with piwik, releases.wikimedia.org. Combing terabytes of data for this few requests doesn't seem the most expedient approach. Our piwik instance is piwik.wikimedia.org. We use it for similarly-low volume metrics.

A javascript beacon can be put it and you start getting your metrics right away.

People don't visit releases.wikimedia.org directly - they click the direct tarball link from mediawiki.org. I don't think piwik will work for that.

fdans raised the priority of this task from Medium to High.Jan 18 2018, 5:57 PM
fdans moved this task from Incoming to Dashiki on the Analytics board.
Milimetric moved this task from Dashiki to Incoming on the Analytics board.
Milimetric moved this task from Dashiki to Incoming on the Analytics board.
Nuria lowered the priority of this task from High to Low.
Nuria moved this task from Incoming to Backlog (Later) on the Analytics board.
mforns subscribed.

Declining, because we have the datasets from the pingback extension.
Please, reopen if necessary.