Page MenuHomePhabricator

Add mediacounts data to AQS and, from there, Restbase
Closed, ResolvedPublic31 Estimated Story Points

Assigned To
None
Authored By
Nuria
Oct 16 2018, 7:34 PM
Referenced Files
None
Tokens
"Like" token, awarded by WMDE-Fisch."Like" token, awarded by WMDE-leszek."Mountain of Wealth" token, awarded by Doc_James."Love" token, awarded by MusikAnimal."Cup of Joe" token, awarded by Capt_Swing.

Description

https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts

This dataset is produced by Analytics, and there is existing infrastructure that (I think) could be used to serve this data. This data is needed to support https://tools.wmflabs.org/mediaviews/ which went down after the mediaviews-api project on Toolforge did not survive Trusty Deprecation.

I am interested in knowing what it would take to have this dataset served through a public endpoint. It seems like it's just a matter of connecting all the parts together, but I would like to know if I am mistaken.


Original task title: API endpoint for mediacounts

Data is available now in downloaded files: https://dumps.wikimedia.org/other/mediacounts/daily/2018/

Event Timeline

Milimetric triaged this task as Medium priority.Oct 17 2018, 7:53 PM

The unofficial API has been down since the Toolforge migration to Debian Stretch. From what I understand it's not very simple to fix. So, I have retired https://tools.wmflabs.org/mediaviews for the time being. Many kudos to @Harej for building the Toolforge API! If you do manage to get it back up, I'll happily bring back Mediaviews, but it's of course preferred to have a production REST API. My hope is we can support both video and images (see also T210313).

Doc_James subscribed.

Yes we need to get https://tools.wmflabs.org/mediaviews/ back up and running. We need to be able to determine if people are interested in watching video viw Wikipedia. This will help our community priorities efforts on VideoWiki https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Videowiki

We need to be able to determine if people are interested in watching video viw Wikipedia.

We all agree we need better media stats around video but https://tools.wmflabs.org/mediaviews/ does not expose data about video plays, cause that data does not exists at this time. The counts are about images/video downloaded, not played..

Harej renamed this task from API endpoint for mediacounts to Add mediacounts data to AQS and, from there, Restbase.May 18 2019, 5:34 PM
Harej added a project: Services.
Harej updated the task description. (Show Details)
Harej added a subscriber: WMDE-Fisch.

I tried to groom this task into a more specific project per Petr Pchelko's input. My hope is that a more precisely scoped task can help us figure out how much work is needed, and, given this, whether it is reasonable for us to wait for this to happen or if we should work in the interim to re-build the mediaviews-api endpoint in a more sustainable way.

https://tools.wmflabs.org/mediaviews/ does not expose data about video plays, cause that data does not exists at this time. The counts are about images/video downloaded, not played..

We would all love more detailed video-play metrics. In the interim, this dataset already exists and is relied upon to give "good enough" impressions metrics for our partners who submit video content to Wikimedia.

We would all love more detailed video-play metrics. In the interim, this dataset already exists and is relied upon to give "good enough" impressions metrics for our partners who submit video content to Wikimedia.

Harej - is there a way to see view stats on a webm video, when video from the player in an article? E.g. the pneumonia article has roughly 3m pageviews per year but we're unsure how often the video is viewed. My understanding was this data is not captured. The debate we face is from editors who believe the videos are not useful or viewed.

@Ian_Furst the foundation does not have any data on video plays at this time, Analytics will be revamping mediacounts in the upcoming year and that will give us more precise image stats, video play data however it is not instrumented at all.

Also, ping @Harej so he sees my last comment, Analytics will be working on better image stats next year and we will have an API endpoint for image viewed similar to the one that now exists for pageviewAPi

Okay so we are looking at 2021 to have this completed?

2021? The work to surface existing data on an API can be done (in the absence of major issues) by the end of this year, 2019. Now, let's please be aware that the data surfaced will be quite imperfect, to the point of 25% (likely) representing preloads, that is, images requested but never seen by a user. We shall quantify the issue with preloads prior to release. Please see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts#Corner_cases

2021? The work to surface existing data on an API can be done (in the absence of major issues) by the end of this year, 2019. Now, let's please be aware that the data surfaced will be quite imperfect, to the point of 25% (likely) representing preloads, that is, images requested but never seen by a user. We shall quantify the issue with preloads prior to release. Please see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts#Corner_cases

Sorry to poke this again, but will that API also be able to expose the video "download counts" as they are provided in https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts ?

Yes they will be , now the counts exposed are video requests (similar to image requests, data includes preloads if any) .

Change 534824 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/aqs@master] Add per file media requests endpoing to AQS

https://gerrit.wikimedia.org/r/534824

API is now live, please see docs: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests

There are still issues related to whether api should let you query by image size so as to rule out icons: https://phabricator.wikimedia.org/T242033

Nuria set the point value for this task to 31.