Needs to happen after T192482.
An oozie step asserting data-quality of a new snapshot by comparing it with the previous one is to be added before the mediawiki-history-reduced data is indexed into druid to be served by AQS.
Given the mediawiki-history-reduced is quite complex, the job/query needs to be carefully thought of and tested.
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add check to mw-history-reduced druid indexation | analytics/refinery | master | +226 -7 | |
Update MWH-reduced to parquet storage | analytics/refinery | master | +15 -23 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | JAllemandou | T177965 Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data | |||
Resolved | JAllemandou | T192483 Add data-quality check on mediawiki-history-reduced before druid indexation |
Event Timeline
Comment Actions
Change 441341 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update MWH-reduced to parquet storage
Comment Actions
Change 441378 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Updating MediawikiHistoryChecker for reduced
Comment Actions
Operational changes that go with this change:
- convert existing data (json) into parquet
- kill old job
- start new job that indexes parquet data
- recreate table in parquet format (repair also in order to create partitions)
Comment Actions
Change 441341 merged by Nuria:
[analytics/refinery@master] Update MWH-reduced to parquet storage
Comment Actions
Change 445373 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add check to mw-history-reduced druid indexation
Comment Actions
Change 445373 merged by Joal:
[analytics/refinery@master] Add check to mw-history-reduced druid indexation