What happens if indexing fails?
How do we make sure metric computations have not changed significantly between runs? How we prevent from serving bad data from a bad run?
Rollback strategy. Can we have 2 snapshots and only flip when one is good?
What happens if indexing fails?
How do we make sure metric computations have not changed significantly between runs? How we prevent from serving bad data from a bad run?
Rollback strategy. Can we have 2 snapshots and only flip when one is good?
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | JAllemandou | T177965 Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data | |||
Resolved | JAllemandou | T192481 Add Mediawiki-History data-quality check stage in oozie using statistics | |||
Resolved | JAllemandou | T192482 Make mediawiki-history-reduced table permanent (snapshot partitioning) | |||
Resolved | JAllemandou | T192483 Add data-quality check on mediawiki-history-reduced before druid indexation | |||
Resolved | JAllemandou | T193387 Add druid datasources as configuration parameter in AQS | |||
Resolved | JAllemandou | T193388 Index by-snapshot mediawiki-history-reduced in druid |
I guess T155507 can help in identifying bad runs of mediawiki history reconstruction.
Plenty of possible different ways here. Listing the two that makes most sense to me:
+1 to @mforns comment
Let's talk about this on our next tasking meeting. I think the best option is the 1st one, so we test validity of data closest to data definition. Thus at creation time. I think warming up of cache should happen after in the AQS deployment step of this data. So warming up of cache is a aqs operation but data loading into druid is contingent on us having some quality score of how good it is.
I think warming up of cache should happen after in the AQS deployment step of this data.
Given we probably want to use Druid as a query engine to check numbers between old and new, cache warming would actually be a side-effect of checking data consistency.
Given we probably want to use Druid as a query engine to check numbers between old and new, cache warming would actually be a side-effect of checking data consistency.
mmm... wait , the data cannot be surfaced outside when we do not know yet whether it is any good. Thus are we talking about requests that are internal to aqs itself? They will warm up the druid cache but in any case should they touch the web cache. Correct?
First round of discussion with the team:
TBD !
For the record, I liked Joseph's idea of 3 data sources. One being served right now, one backup, and one being loaded next. When loaded_next is done, it is checked against served_right_now for accuracy and cache warming. When that passes, the backup is deleted, and served_right_now becomes backup, loaded_next becomes served_right_now. How to do this is still up for debate.
The other thing to mention is that Druid in theory supports in-place updating of the kind of data we serve for Wikistats via its Lookup mechanism http://druid.io/docs/latest/querying/lookups.html. We never looked into this in depth, and now with Druid 10 it might be a good idea.
Anoher round of discussion with team:
Not discussed this time: How do we swap datasources????