We're missing a lot of data for that hour, see the original task T337088.
Looking at data on HDFS, it seems the problem comes from the webrequest-refine job, as the amount of data in webrequest-raw seems normal.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | JAllemandou | T337482 Problem with webrequest data for one hour: 2023-05-20 hour 0 to 1 | |||
Resolved | JAllemandou | T337088 Druid Webrequest sampled 128 has missing data data for 1 hour |
Event Timeline
Comment Actions
We have not found the root cause for this :(
A rerun of the webrequest-refine job solved the missing data issue, and we then reran dowstream jobs (thank you @mforns :).
It would be good to add alerts about how much data is present in our main datasets for every partition to react quicker than we did this time.