Page MenuHomePhabricator

Failed executing job: AutoModeratorFetchRevScoreJob
Closed, ResolvedPublicPRODUCTION ERROR

Description

We're seeing a high number of "revision not found" errors now that trwiki has enabled automoderaotr (T362622); this isn't destructive in any way, as AutoModerator simply doesn't make any changes when this error occurs, but it means that the extension isn't working for these revisions.

Error
normalized_message
Failed executing job: AutoModeratorFetchRevScoreJob Rıdvan_Uz wikiPageId=3567508 revId=33119075 originalRevId= userId=233143 userName=Fatih Demirci tags=[] namespace=0 title=Rıdvan_Uz requestId=c425cca2-c514-4061-aaf6-3feec5361d51
exception.trace
Impact
Notes

Details

Request URL
https://mw-jobrunner.discovery.wmnet/rpc/RunSingleJob.php
Related Changes in Gerrit:

Event Timeline

jsn.sherman triaged this task as Unbreak Now! priority.

This looks like a problem with flagged revisions, which we'll need to sort out.

actually, this looks a lot like the error in T366933. Our solution there is to make some changes to avoid running into it on edits that we don't actually want to score in the first place. This could be a good thing, since it makes flagged revisions a less likely suspect

@jsn.sherman could it be possible for the Automoderator to ignore flagged revisions on tr:wiki? If so, what would be the implications and potential solutions to implement this?

@Dogu we haven't diagnosed the problem yet, so it's too early to start working on solutions. The other task I referenced is on test wiki which isn't using flagged revisions. That's a good thing, because it means we can try to work out the problem somewhere other than trwiki.

Looking for commonalities between testwiki and trwiki:

On testwiki, these errors coincide with an "Inconsistent revision ID" warning from parser cache:

image.png (146×2 px, 40 KB)

In this example revid1 in parsercache matches the revision id that we are using. That also matches what's in the page revision history

And I verified the same is true with trwiki. I tried to get a better screenshot this time:

image.png (194×1 px, 42 KB)

jsn.sherman changed the task status from Open to In Progress.Jun 13 2024, 9:45 PM
jsn.sherman moved this task from In Progress to Eng review on the Moderator-Tools-Team (Kanban) board.

I switched over to lazy push for the job queue insert. I experimented with adding a release timestamp, but that seemed to break things in mediawiki-docker, eg. No jobs were queued at all that way.

Change #1043168 had a related patch set uploaded (by Scardenasmolinar; author: Jsn.sherman):

[mediawiki/extensions/AutoModerator@master] Jobs: use lazyPush

https://gerrit.wikimedia.org/r/1043168

Change #1043168 merged by jenkins-bot:

[mediawiki/extensions/AutoModerator@master] Jobs: use lazyPush

https://gerrit.wikimedia.org/r/1043168

A few notes:

I didn't state it clearly in this task, but the reasoning behind trying to defer running our jobs is that we suspect there is either a caching or a db consistency issue. Running the job later is a very coarse way to check that. Switching to lazyPush is a move in the right direction, but I'm not sure if it's enough.

@kostajh suggested that this could be a moment to consider moving to RecentChangeSave. ORES uses this hook and doesn't seem to suffer from the same issue. Having said that, ORES fetch score job isn't instantiating revisions to work on them, so that's not definitive.

I'll be looking at this more tomorrow.

Change #1043347 had a related patch set uploaded (by Jsn.sherman; author: Jsn.sherman):

[mediawiki/extensions/AutoModerator@master] Jobs: retry when revision not found

https://gerrit.wikimedia.org/r/1043347

The simplest change we could make besides doing lazyPush is to allow retries when we encounter this.

Change #1043347 merged by jenkins-bot:

[mediawiki/extensions/AutoModerator@master] Jobs: retry when revision not found

https://gerrit.wikimedia.org/r/1043347

So far, it looks like there have been no more occurrences of this error on testwiki since 1.43.0-wmf.10 rolled out to it.

So far, it looks like there have been no more occurrences of this error on testwiki since 1.43.0-wmf.10 rolled out to it.

Does that mean we can downgrade from UBN?

Scardenasmolinar lowered the priority of this task from Unbreak Now! to High.Jun 20 2024, 4:13 PM

@Dogu we believe we have this sorted out; could you re-enable automoderator on trwiki? We'll be monitoring for errors.

@Dogu we believe we have this sorted out; could you re-enable automoderator on trwiki? We'll be monitoring for errors.

Done!