Page MenuHomePhabricator

Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users
Open, LowPublic0 Estimated Story Points

Description

As part of the Wikistats 2 project we have developed the Edit Data Lake (see https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits). The Edit Data Lake is a denormalized data store that is the best dataset we have had to date to answer questions about content and contributors. At this time this data is only available for the WMF in the private hadoop cluster.
This is the parent task for all the work to make the Data Lake data available on our public cloud infrastructure for our community at large; the more accessible that data is, the more impact it can have.

Event Timeline

Nuria renamed this task from Edit Data Lake available in labs: Mediawiki history snapshots available in SQL data store to Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to clod (labs) users.Sep 20 2018, 1:45 PM
Nuria updated the task description. (Show Details)
Milimetric renamed this task from Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to clod (labs) users to Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.Sep 24 2018, 3:49 PM
Milimetric triaged this task as Medium priority.
Milimetric removed a project: Analytics-Kanban.
Milimetric moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Milimetric raised the priority of this task from Medium to High.Oct 18 2018, 5:36 PM
Milimetric added a project: Analytics-Kanban.
Milimetric set the point value for this task to 0.
Milimetric subscribed.

De-prioritizing until cloud infrastructure can support monitoring similar to what we can do in production.

Milimetric lowered the priority of this task from High to Low.Aug 31 2020, 5:01 PM
Milimetric moved this task from Deprioritized to Smart Tools for Better Data on the Analytics board.

This needs further consideration.