We have been on full jobs mode but there are still around 1k rows from yesterday. That's around 0.2% of total number of edits but still needs to be investigated.
Description
Related Objects
Event Timeline
So far my investigation yielded:
- All of them have subscribed by client wikis
- It happens on all sorts of edits, client update, statement change, label change, etc.
- The change never reached the wiki, meaning it's not deduplication not working.
- Spot checking some, I couldn't find anything in logs implying error or unsuccessful work.
- Only two don't have a corresponding rc entry
The job gets queued:
ladsgroup@stat1005:~$ kafkacat -b kafka-main1002.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.DispatchChanges' -o -1000000 | grep -i 1494749263 {"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"b744cf2f-b349-47f9-9175-7f2cb5463f9d","id":"b0b4bb30-3cb1-4b74-a604-7941644cf318","dt":"2021-10-05T07:14:14Z","domain":"www.wikidata.org","stream":"mediawiki.job.DispatchChanges"},"database":"wikidatawiki","type":"DispatchChanges","params":{"title":"Q108356988","entityId":"Q108356988","changeId":1494749263},"mediawiki_signature":"d8af401f3c8a99d7cc36573f2c6b8f99aa611d3e"}
Ah, that's enwiktionary! That is probably one job that was queued when we only had the original 10 wikis enabled. Back then we didn't have the deleting of rows by the DispatchChanges job enabled yet.
but that's an edit made yesterday (https://www.wikidata.org/w/index.php?diff=1508126207) and we have had it enabled everywhere all of yesterday. Did I miss something super obvious?
The only way to properly debug this is to have job info in hadoop to be able to properly query them.
The cause of this seems to have been the job crashing instead of logging an error on encountering a closed or deleted wiki. Now that this is resolved, all rows are consistently being deleted by the DispatchChanges job and the table rarely exceeds 10 rows: