Page MenuHomePhabricator

Problem with webrequest data for one hour: 2023-05-20 hour 0 to 1
Closed, ResolvedPublic

Description

We're missing a lot of data for that hour, see the original task T337088.
Looking at data on HDFS, it seems the problem comes from the webrequest-refine job, as the amount of data in webrequest-raw seems normal.

Event Timeline

JAllemandou triaged this task as Unbreak Now! priority.May 25 2023, 1:41 PM

We have not found the root cause for this :(
A rerun of the webrequest-refine job solved the missing data issue, and we then reran dowstream jobs (thank you @mforns :).
It would be good to add alerts about how much data is present in our main datasets for every partition to react quicker than we did this time.