Page MenuHomePhabricator

Add new mediatypes to media classification refinery code
Closed, ResolvedPublic5 Estimated Story Points

Description

At the moment it's only registering a small number of file types, so the file classification is not great.

Event Timeline

Ottomata triaged this task as High priority.
Ottomata moved this task from Incoming to Data Quality on the Analytics board.
Ottomata added a project: Analytics-Kanban.

Change 517641 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery/source@master] Add media formats to file url parser regex

https://gerrit.wikimedia.org/r/517641

@fdans let's please file tasks for mediarequest api as child tasks of https://phabricator.wikimedia.org/T207208 so we keep track of all the work we are doing on this regard.

Change 522390 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery@master] [wip] Add file extension and media classification to mediacounts job

https://gerrit.wikimedia.org/r/522390

Change 517641 merged by Fdans:
[analytics/refinery/source@master] Add file extension and media type classification to media files UDF

https://gerrit.wikimedia.org/r/517641

Nuria renamed this task from Expand regex that maps file types to media to Add new mrediatypes to media classification refinery code.Jul 23 2019, 2:48 PM
Nuria moved this task from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Nuria renamed this task from Add new mrediatypes to media classification refinery code to Add new mediatypes to media classification refinery code.Jul 24 2019, 5:00 PM

In order for the code to take effect the job needs to be re-started

Change 522390 abandoned by Fdans:
[wip]Add file extension and media classification to mediacounts job

Reason:
Since we'll be using a new dataset for the mediarequests API, I'm going to abandon this and open a new change instead

https://gerrit.wikimedia.org/r/522390

Nuria set the point value for this task to 5.