There are some statistics that are mined out of Apache logs files for the wikidata dumps, but we don't really have any for the xmldumps. It would be nice to see if we can leverage the work that @Addshore did for wikidata to get some tracking for other dump files.
Description
Related Objects
- Mentioned In
- T147177: An API for monitoring dumps of WMF wikis
Event Timeline
I would be interested to see number of downloads per month broken down like this:
How many folks are crawling most of the tree for a specific dump (i.e. revision content for all wikis)?
How many are downloading just one language or just one wiki exclusively?
What type of dump is the least popular (which tables, or which content-based dumps)?
Which small wikis are the most downloaded?
Etc.
These aren't burning needs but they would be nice to have.
Do folks have a timeframe for when they might start looking at this? Alternatively, who should I add to the ticket that would know?
For reference the script currently used for the WMDE / wikidata stuff can be found at https://github.com/wikimedia/analytics-wmde-scripts/blob/master/src/wikidata/dumpDownloads.php
Re-upping this. I know the logs get sucked into a pile of data because https://phabricator.wikimedia.org/T118739 Does anyone process them?
Wikidata team is processing them: https://grafana.wikimedia.org/dashboard/db/wikidata-dump-downloads?refresh=5m&orgId=1
This is done with https://github.com/wikimedia/analytics-wmde-scripts/blob/master/src/wikidata/dumpDownloads.php
That's great for the wikidata dumps. How about the xml/sql dumps? It would sure be nice to have.
I don't think anyone is making use of the data (except for the Wikidata specific things mentioned above).