Page MenuHomePhabricator

Add Mediawiki-History data-quality check stage in oozie using statistics
Closed, ResolvedPublic13 Estimated Story Points

Description

This oozie step should happen after the spark denormalization job, and before repairing the history hive tables.
This check will use he statistics generated by Accumulator (T155507).

Event Timeline

Milimetric added a project: Analytics.

Change 434987 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Update mediawiki-history stats to not compressed

https://gerrit.wikimedia.org/r/434987

Change 434987 merged by Joal:
[analytics/refinery/source@master] Update mediawiki-history stats

https://gerrit.wikimedia.org/r/434987

Change 439869 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Add MediawikiHistoryChecker spark job

https://gerrit.wikimedia.org/r/439869

Change 440005 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add check step in mediawiki-history jobs

https://gerrit.wikimedia.org/r/440005

Change 441378 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Update MediawikiHistoryChecker adding reduced

https://gerrit.wikimedia.org/r/441378

JAllemandou set the point value for this task to 13.Aug 9 2018, 3:03 PM

Change 439869 merged by jenkins-bot:
[analytics/refinery/source@master] Add MediawikiHistoryChecker spark job

https://gerrit.wikimedia.org/r/439869

Change 441378 merged by Joal:
[analytics/refinery/source@master] Update MediawikiHistoryChecker adding reduced

https://gerrit.wikimedia.org/r/441378

Change 440005 merged by Joal:
[analytics/refinery@master] Add validation step in mediawiki-history jobs

https://gerrit.wikimedia.org/r/440005