Page MenuHomePhabricator

Add mediacounts data to AQS and, from there, Restbase
Open, NormalPublic

Tokens
"Like" token, awarded by WMDE-Fisch."Like" token, awarded by WMDE-leszek."Mountain of Wealth" token, awarded by Doc_James."Love" token, awarded by MusikAnimal."Cup of Joe" token, awarded by Capt_Swing.
Assigned To
None
Authored By
Nuria, Oct 16 2018

Description

https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts

This dataset is produced by Analytics, and there is existing infrastructure that (I think) could be used to serve this data. This data is needed to support https://tools.wmflabs.org/mediaviews/ which went down after the mediaviews-api project on Toolforge did not survive Trusty Deprecation.

I am interested in knowing what it would take to have this dataset served through a public endpoint. It seems like it's just a matter of connecting all the parts together, but I would like to know if I am mistaken.


Original task title: API endpoint for mediacounts

Data is available now in downloaded files: https://dumps.wikimedia.org/other/mediacounts/daily/2018/

Event Timeline

Nuria created this task.Oct 16 2018, 7:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 16 2018, 7:34 PM
Milimetric triaged this task as Normal priority.Oct 17 2018, 7:53 PM
Capt_Swing added a subscriber: Capt_Swing.
MusikAnimal added a subscriber: MusikAnimal.

The unofficial API has been down since the Toolforge migration to Debian Stretch. From what I understand it's not very simple to fix. So, I have retired https://tools.wmflabs.org/mediaviews for the time being. Many kudos to @Harej for building the Toolforge API! If you do manage to get it back up, I'll happily bring back Mediaviews, but it's of course preferred to have a production REST API. My hope is we can support both video and images (see also T210313).

Doc_James added a subscriber: Doc_James.

Yes we need to get https://tools.wmflabs.org/mediaviews/ back up and running. We need to be able to determine if people are interested in watching video viw Wikipedia. This will help our community priorities efforts on VideoWiki https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Videowiki

Nuria added a comment.May 6 2019, 2:55 PM

We need to be able to determine if people are interested in watching video viw Wikipedia.

We all agree we need better media stats around video but https://tools.wmflabs.org/mediaviews/ does not expose data about video plays, cause that data does not exists at this time. The counts are about images/video downloaded, not played..

Harej renamed this task from API endpoint for mediacounts to Add mediacounts data to AQS and, from there, Restbase.May 18 2019, 5:34 PM
Harej added a project: Services.
Harej updated the task description. (Show Details)
Harej added a subscriber: WMDE-Fisch.
Harej added a comment.May 18 2019, 5:39 PM

I tried to groom this task into a more specific project per Petr Pchelko's input. My hope is that a more precisely scoped task can help us figure out how much work is needed, and, given this, whether it is reasonable for us to wait for this to happen or if we should work in the interim to re-build the mediaviews-api endpoint in a more sustainable way.

https://tools.wmflabs.org/mediaviews/ does not expose data about video plays, cause that data does not exists at this time. The counts are about images/video downloaded, not played..

We would all love more detailed video-play metrics. In the interim, this dataset already exists and is relied upon to give "good enough" impressions metrics for our partners who submit video content to Wikimedia.

We would all love more detailed video-play metrics. In the interim, this dataset already exists and is relied upon to give "good enough" impressions metrics for our partners who submit video content to Wikimedia.

Harej - is there a way to see view stats on a webm video, when video from the player in an article? E.g. the pneumonia article has roughly 3m pageviews per year but we're unsure how often the video is viewed. My understanding was this data is not captured. The debate we face is from editors who believe the videos are not useful or viewed.

Nuria added a comment.May 19 2019, 3:37 AM

@Ian_Furst the foundation does not have any data on video plays at this time, Analytics will be revamping mediacounts in the upcoming year and that will give us more precise image stats, video play data however it is not instrumented at all.

Nuria added a comment.May 19 2019, 3:39 AM

Also, ping @Harej so he sees my last comment, Analytics will be working on better image stats next year and we will have an API endpoint for image viewed similar to the one that now exists for pageviewAPi

Nuria added a comment.EditedJun 11 2019, 5:12 PM

Initial design doc of the work planned for next year: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Media_metrics

Okay so we are looking at 2021 to have this completed?

Nuria added a comment.Jun 11 2019, 5:27 PM

2021? The work to surface existing data on an API can be done (in the absence of major issues) by the end of this year, 2019. Now, let's please be aware that the data surfaced will be quite imperfect, to the point of 25% (likely) representing preloads, that is, images requested but never seen by a user. We shall quantify the issue with preloads prior to release. Please see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts#Corner_cases

2021? The work to surface existing data on an API can be done (in the absence of major issues) by the end of this year, 2019. Now, let's please be aware that the data surfaced will be quite imperfect, to the point of 25% (likely) representing preloads, that is, images requested but never seen by a user. We shall quantify the issue with preloads prior to release. Please see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts#Corner_cases

Sorry to poke this again, but will that API also be able to expose the video "download counts" as they are provided in https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts ?

Nuria added a comment.Jun 12 2019, 5:21 PM

Yes they will be , now the counts exposed are video requests (similar to image requests, data includes preloads if any) .