After a run of ~4.5M images on Commons, Grafana shows that we still have 571K jobs in the queue.
It would be good to know if this is due to errors calling PhotoDNA, or a reporting problem with Grafana, or just an unexpected blip.
| • eprodromou | |
| Aug 17 2020, 6:08 PM |
| F32361346: image.png | |
| Sep 23 2020, 9:01 PM |
| F32186774: Screen Shot 2020-08-17 at 1.33.57 PM.png | |
| Aug 17 2020, 6:09 PM |
After a run of ~4.5M images on Commons, Grafana shows that we still have 571K jobs in the queue.
It would be good to know if this is due to errors calling PhotoDNA, or a reporting problem with Grafana, or just an unexpected blip.
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Declined | None | T247977 Implement Hash Checking of Media Files | |||
| Declined | None | T256982 MediaModeration Productionizing | |||
| Resolved | • Pchelolo | T260586 Investigate unprocessed MediaModeration jobs |
@eprodromou: Could you please answer the last comment (or provide a link where to see that board)? Thanks in advance!
We are also seeing all jobs stopped running today: https://grafana.wikimedia.org/d/LSeAShkGz/jobqueue?viewPanel=22&orgId=1&from=now-24h&to=now
This is jeopardizing our commitment to the Safe & Secure Spaces OKR considerably. We are in significant danger of not meeting our deliverable if we do not unblock the scans in a prompt manner.
That's in
That's interesting... Will have a look tomorrow on why did it get stuck and fix it, this is not something to be tolerated.
Ok. Figured it out. My bad. Kafka (the queuing system used by the job queue behind the scenes) has a retention policy with 1 week TTL by default. So, the scheduled jobs just vanished into thin air after one week of processing...
We bumped the retention to 1 month, I'll resubmit the second half of the jobs back. Thank you for monitoring and noticing.
Ok, this batch was processed correctly after we've increased the kafka garbage collection TTL for the topic.