Page MenuHomePhabricator

Get some statistics for dump downloads
Open, Needs TriagePublic

Description

There are some statistics that are mined out of Apache logs files for the wikidata dumps, but we don't really have any for the xmldumps. It would be nice to see if we can leverage the work that @Addshore did for wikidata to get some tracking for other dump files.

Event Timeline

I would be interested to see number of downloads per month broken down like this:

How many folks are crawling most of the tree for a specific dump (i.e. revision content for all wikis)?
How many are downloading just one language or just one wiki exclusively?
What type of dump is the least popular (which tables, or which content-based dumps)?
Which small wikis are the most downloaded?

Etc.

These aren't burning needs but they would be nice to have.

Do folks have a timeframe for when they might start looking at this? Alternatively, who should I add to the ticket that would know?

Re-upping this. I know the logs get sucked into a pile of data because https://phabricator.wikimedia.org/T118739 Does anyone process them?

That's great for the wikidata dumps. How about the xml/sql dumps? It would sure be nice to have.

That's great for the wikidata dumps. How about the xml/sql dumps? It would sure be nice to have.

I don't think anyone is making use of the data (except for the Wikidata specific things mentioned above).

I don't think anyone is making use of the data (except for the Wikidata specific things mentioned above).

As far as I know that is correct

Who's the right person to poke about this, or should I be writing the script myself?