As part of the Wikistats 2 project we have developed the Edit Data Lake (see https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits). The Edit Data Lake is a denormalized data store that is the best dataset we have had to date to answer questions about content and contributors. At this time this data is only available for the WMF in the private hadoop cluster.
This is the parent task for all the work to make the Data Lake data available on our public cloud infrastructure for our community at large; the more accessible that data is, the more impact it can have.
Description
Description
Event Timeline
Comment Actions
De-prioritizing until cloud infrastructure can support monitoring similar to what we can do in production.