Page MenuHomePhabricator

Set up data infrastructure for program metrics
Closed, DeclinedPublic

Description

We've started to plan how we'll calculate the Audiences 2018–19 annual plan metrics, and we've identified some infrastructural needs.

Daily editing data, with tags, in the Data Lake

Filed

  • Need some type of aggregated and/or denormalized table of edits that contains data about edit tags (to be used for mobile retention, mobile edits and editor counts, or anything else that relies on tagging edits)
    • Needs to be granular enough to look at short-term editor retention (daily is probably sufficient)
  • We need it updated relatively quickly, within a day after the events take place (if the Data Lake is only source and it isn't available till 10th after month's end, that is not tenable).
  • A one-time sqoop of the change_tag tables into the Data Lake to provide past edit tag data to complement current data

Identifying reverts

In mediawiki_history, we'd calculate reverts using the convenient is_reverted field. How do we do it when we simply have a table of revisions with their hashes?

Event Timeline

nshahquinn-wmf triaged this task as High priority.
nshahquinn-wmf created this task.
nshahquinn-wmf moved this task from Triage to Next Up on the Product-Analytics board.
Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptAug 30 2018, 11:02 PM

@Neil_P._Quinn_WMF to close out this task and file separate follow-up tasks/requests as needed (e.g. In mediawiki_history, we'd calculate reverts using the convenient is_reverted field. How do we do it when we simply have a table of revisions with their hashes?)

As Kate said, closing this because this parent task doesn't have any value. I've separated out the reverts issue as T216297.