Page MenuHomePhabricator

Provide download statistics of files on Wikimedia Commons
Open, LowestPublic

Description

A frequent request from GLAM partners is: how often are 'my' files downloaded from Wikimedia Commons?

So here's a request to make the numbers of downloads of a file over a given time period available so that it can be integrated in, for instance, statistics dashboards and other tools.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2019, 9:23 AM
Aklapper added a comment.EditedMar 12 2019, 11:22 AM

How is "download" defined, compared to "view image itself in browser" / "open image itself in a certain resolution in browser"? Does the "save as" action in the browser trigger anything measurable apart from copying a file that's in the local browser cache already to another place defined by the user (="download")?

Ramsey-WMF moved this task from Untriaged to Desired epics on the Multimedia board.
Ramsey-WMF added a subscriber: Abit.

This is a tough one to solve and we've had some very preliminary discussions with Analytics on how to reasonably approximate this, but there's a lot of fundamental discussion that still needs to be had. Will revisit as soon as we get some breathing room.

Aklapper triaged this task as Lowest priority.Apr 2 2020, 12:45 PM
Aklapper added a project: Analytics.

Adding Analytics per last comment

@Ramsey-WMF any more thoughts on the definition of "download"? Once we have that, the technical work is relatively easy.

Milimetric moved this task from Incoming to Blocked on the Analytics board.Apr 6 2020, 4:27 PM

@Milimetric picking this up again. We want to measure "Save As..." ideally, but in previous discussions (it's been over a year, fuzzy) we were told it was hard to differentiate because of how Commons handles links (ex: the "Download" button doesn't trigger a download, just displays a URL to the file that the user has to copy-paste). Perhaps we can measure how many times someone clicks that link per file to signal intent...although it's not very reliable.

Yeah, I'm not sure, it depends on exactly what you're trying to accomplish with the metric. I'm not sure if you saw in the meantime that we exposed API endpoints that let people query how often each file is requested? Maybe one way to figure out what exactly you want to measure here is to show people those stats and dig into the existing data about the "Download" button (either by inferring it from Webrequest or instrumenting it with EventLogging). Comparing these should give you a good idea what signal you're trying to find and how to best do that. I'm happy to help advise in a meeting, just let me know.

Moving this to radar until we see some update to instrumentation and we can revisit.