Page MenuHomePhabricator

Verify edits_hourly in Druid/Turnilo/Superset
Closed, ResolvedPublic


With T221338 complete, the edits_hourly data (T211173) should now be ready to use. The purpose of this task is to dobule-check the data and make sure there are no major outstanding issues.

Outstanding Issues:

  • content edit counts in wmf.mediawiki_history are not fully reliable. T221338
  • all anon users display with 10,000 edit count on edit_hourly dataset. T224941

Proposed checks:

View in Turnilo and Superset. Confirm data appears as expected by applying various filters and splits.✅ Confirmed
Confirm data matches query results on wmf.mediawiki_history data and monthly contributors metrics numbers.✅ Confirmed. See shared doc
Confirm that the content page edit counts issue was corrected by comparing to query results on MariaDB replicas.✅ Confirmed
Perform queries to confirmed revision events in mediawiki_history and in Druid have expected page info.✅ Confirmed.
Confirm anon users display with correct edit_count✅ Confirmed. Note: Anonymous users edit user count bucket is listed as undefined at this time because that info is not available in Data lake.

Event Timeline

kzimmerman triaged this task as High priority.
kzimmerman moved this task from Triage to Next Up on the Product-Analytics board.

Megan will lead this, pairing with and reaching out to @Neil_P._Quinn_WMF as needed.

MNeisler updated the task description. (Show Details)

Completed QC for Turnilo and Superset and verified against mediawiki history data. Reported one bug which was resolved earlier today. Please find all scenarios tested in this shared document.

MNeisler updated the task description. (Show Details)
MNeisler updated the task description. (Show Details)
MNeisler closed this task as Resolved.EditedAug 22 2019, 1:15 AM

Thanks! Just following up to confirm I verified that the content page edit counts issue count looks like it has been fixed. Current notebook with analysis.

Also, there is an additional 'Count' metric in Turnilo that needs to be removed. It was added by default in Turnilo but doesn't make sense for this dataset. I've filed a separate task with Analytics to remove.