Assigining to @MusikAnimal as FYI we will ping when this data is available and ready to be used, likely in the next couple of weeks.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | None | T210313 Statistics for views of individual Wikimedia images | |||
| Resolved | MusikAnimal | T234590 Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title | |||
| Resolved | • fdans | T244373 Fix double encoding of urls on mediarequests api | |||
| Open | None | T244712 Allow users to query mediarequests using a file page link |
Event Timeline
The Apis are now deployed so code changes can happen, we are still loading data (and will be so for the next month) so until all past data is loaded we would like to hold on announcing this change but please start doing code chnages to integrate with pageview tool: https://wikitech-static.wikimedia.org/wiki/Analytics/AQS/Media_metrics
https://tools.wmflabs.org/mediaviews is back! I would still consider this "beta". Any feedback is most welcomed. More features will be added in the near future, such as getting request counts for all media in a category.
I'll keep this open until I'm satisfied it is stable.
Nice! Great job!
Is this the place to report issues?
Trying to measure video files always prompts this error (german interface): Fehler beim Abruf der Media requests API - Not found.
https://tools.wmflabs.org/mediaviews/?project=commons.wikimedia.org&platform=&referer=all-referers&start=2019-02&end=2020-01&files=Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv
https://tools.wmflabs.org/mediaviews/?project=commons.wikimedia.org&platform=&referer=all-referers&start=2019-02&end=2020-01&files=Klima-Archiv_Eislabor_(ZDF,_Terra_X)_720p_HD_50FPS.webm
Furthermore: "average" is always named "daily average" / "täglicher durchschnitt" (ger), even when I've chosen "monthly" to analyse the data. This is misleading.
FWIW here's the full error message your request produces (from the developer console):
The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet. Please check https://wikimedia.org/api/rest_v1/?doc for more information.
And here's a video for which the request is working fine: https://tools.wmflabs.org/mediaviews/?project=commons.wikimedia.org&platform=&referer=all-referers&start=2019-02-01&end=2020-01-31&files=New_York_1911.webm
So, it's not something that affects ALL videos. I guess there's just no data available for the examples you gave.
Yes, Tool-Pageviews. This task is fine for now :)
Indeed, apparently data is missing for those two videos. There were definitely some requests... for instance the first one has a deletion discussion, and surely participants looked at the file. This issue should probably be raised with Analytics.
Furthermore: "average" is always named "daily average" / "täglicher durchschnitt" (ger), even when I've chosen "monthly" to analyse the data. This is misleading.
Good catch! I will fix this.
What is happening here (cc @fdans) is that requests are for this filename: "/wikipedia/commons/5/53/Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv".
API request:
This is somewhat confusing for end users because it is just not possible to know that the filename is "/wikipedia/commons/5/53/Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv" (unless the addition of 5/53 is a wide understood convention that I am not familiar with).
Ideas welcomed
See https://www.mediawiki.org/wiki/Manual:Image_administration#Data_storage. The /5/53/ components in the name are related to the use of the $wgHashedUploadDirectory in the Wikimedia production wiki cluster. They come from the md5 hash of the string "Präsidentschaftswahl_in_den_Vereinigten_Staaten.ogv":
$ echo -n 'Präsidentschaftswahl_in_den_Vereinigten_Staaten.ogv' | md5sum 538351d2a7056e4bddc091c730a6a6ef -
Exposing this storage implementation detail in the statistics seems unfriendly to users. I would assume that for most interested parties the File namespace title ("Präsidentschaftswahl_in_den_Vereinigten_Staaten.ogv" in this case) would be the obvious lookup key rather than the thumbnail/media storage path.
("Präsidentschaftswahl_in_den_Vereinigten_Staaten.ogv" in this case) would be the obvious lookup key rather than the thumbnail/media storage path.
Indeed, which suggests that we should actually rework how are we parsing urls and more importantly reload the data with "unfriendly paths". cc Analytics
Leaving the md5 hash bits out of the discussion, the problem with fetching the mediarequests data for https://commons.wikimedia.org/wiki/File:Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv is a mismatch in how the URL is generated by @MusikAnimal's mediaviews tool and the URL expected by the API service.
- Generated URL: https://wikimedia.org/api/rest_v1/metrics/mediarequests/per-file/all-referers/user/%2Fwikipedia%2Fcommons%2F5%2F53%2FPr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv/daily/2019020100/2020013100
- Expected URL: https://wikimedia.org/api/rest_v1/metrics/mediarequests/per-file/all-referers/user/%2Fwikipedia%2Fcommons%2F5%2F53%2FPr%25C3%25A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv/daily/2019020100/2020013100
The diff here is:
| Generated | Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv |
| Expected | Pr%25C3%25A4sidentschaftswahl_in_den_Vereinigten_Staaten.ogv |
The expected URL on the API's side has double URL encoded the title. %C3 (ä) in the title has become %25C3 in the API URL. %25 is the URL encoding of %.
@MusikAnimal could work around this in his client, but if it is possible to fix in the backing API that would be nicer for other consumers.
The expected URL on the API's side has double URL encoded the title. %C3 (ä) in the title has become %25C3 in the API URL. %25 is the URL encoding of %.
@MusikAnimal could work around this in his client, but if it is possible to fix in the backing API that would be nicer for other consumers.
Thanks! I will fix this clientside for now.
I assumed we used storage paths to distinguish where they are stored. Say Foo.jpg is a local image on fr.wikipedia, and de.wikipedia has an image under the same name. Also, I think it's possible to locally override files on Commons (right?). This could be alleviated by accepting a project domain in addition to the file name.
In my application, it's easy enough to get the storage path since I'm calling prop=imageinfo anyway to get things like the upload time and file size.
Thanks! I will fix this clientside for now.
opened ticket to fix on API as well (subtask of this one)
I've never seen filenames collide, the hash part is a legacy thing that became a de-facto standard at the WMF. It has to do with the maximum number of inodes when media URLs are directly mapped to files and folders on disk. Since nobody has owned MediaWiki media backend code in the last 5+ years, suggestions to change the media URL scheme have gone nowhere.
The filename is not unique globally, and it is quite normal for e.g. enwiki and commonswiki to both have a file called "Example.jpg". And indeed many wikis have an old archived file called "Wiki.png", for example, which used to be where the site logo was stored.
The /a/ab/ hashing scheme, as Gilles says, indeed has no relation to identification of files. It is as arbitrary and specific to the file storage and thumbnail server as other insignificant parts of the url such as /100px-…. These parts should be ignored and definitely not exposed or hardcoded into a public user-facing API.
That is to say, if you were to see /wikipedia/commons/a/ab/Example.jpg/100px-… and /wikipedia/commons/t/tv/Example.jpg/42px-…, they should both be considered counts for (wikipedia/commons, Example.jpg).
What is globally unique is the combination of wiki and filename. Where wiki is expressed as a two-pair upload bucket unique to that one wiki, e.g. wikipedia/en for enwiki, and wikipedia/commons for commonswiki. For legacy reasons, the oldest non-Wikipedia wikis still have their bucket name start with wikipedia but this is unlikely to change and can be ignored. The only complication with this is that it makes it difficult for a user interface to nicely provide a dropdown menu with wikis and an input field for file names, because there's no public API to map that to a wiki bucket. However, if the user interface tool makes an intermediary request to {wiki}/w/api.php to get the original file path, then it can be readily extracted from that indeed. It sounds like @MusikAnimal is doing that already :)
action=query&meta=filerepoinfo&friprop=name|displayname|url|rootUrl does give you the base URL (domain + bucket) for the wiki you call it on. (Not quite sure what's the difference between url and rootUrl...)
This was resolved ~2 years ago. If there are any remaining bugs please open a new task.