Page MenuHomePhabricator

1.44.0-wmf.22 deployment blockers
Closed, ResolvedPublic5 Estimated Story PointsRelease

Details

Backup Train Conductor
jnuche
Release Version
1.44.0-wmf.22
Release Date
Mon, Mar 24, 12:00 AM

2025 week 13 1.44-wmf.22 Changes wmf/1.44.0-wmf.22

This MediaWiki Train Deployment is scheduled for the week of Monday, March 24th:

Monday March 24thTuesday, March 25thWednesday, March 26thThursday, March 27thFriday
Backports only.Branch wmf.22 and deploy to Group 0 Wikis.Deploy wmf.22 to Group 1 Wikis.Deploy wmf.22 to all Wikis.No deployments on fridays

How this works

  • Any serious bugs affecting wmf.22 should be added as subtasks beneath this one.
  • Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
  • If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
  • If you have a risky change in this week's train add a comment to this task using the Risky patch template
  • For more info about deployment blockers, see Holding the train.

Related Links

Other Deployments

Previous: 1.44.0-wmf.21
Next: 1.44.0-wmf.23

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptFeb 12 2025, 3:29 PM
thcipriani triaged this task as Medium priority.
thcipriani updated Other Assignee, added: jnuche.
thcipriani set the point value for this task to 5.
Risky Patch! ๐Ÿš‚๐Ÿ”ฅ

Main stash is x2 but by the time the train will reach production, we will have actually switched to to ms1. ms2. ms3 clusters (writing to two out of three in each case). See T383327: Re-architecture mainstash (x2) to allow easier maintenance

What this sounds like:

Previously data was written to MainStash for 90% of edits and now for 100% of edits (90% because patrolled edits refers to edits by an autopatrolled bot or trusted autopatrolled user; noting that autopatrolled user status is not given to most bots, and only handed out to users on wikis that use RCPatrol which most wikis don't use; so most edits on most wikis have rc_patrolled=0 and are considered unpatrolled).

If I'm reading the patch right, this is not accurate. Actually:

Previously data was written for <0.1% of edits and limited to ~100 mostly small wikis. Now, data is written for 100% of edits, on all wikis, and kept for 30 days.


My assumption / understanding of previous status quo:

  • DerivedPageDataUpdater is responsible for tagging "revert" edits as such. "Revert" is defined as an edit that undoes a previous edit and is marked as patrolled. This means we only publicly recognise an edit as a revert if either it used the "rollback" feature, or it was a manual "undo" by a trusted autopatrolled user. Also per T259103, although it seems like a misunderstanding in the product design, since 2020 this logic re-runs when an undo by an untrusted user is later marked as patrolled. (The reason this seems like a mistake is that T259103 mistakes patrolling for "approving". Patrolling is an expected "marked as read" end state for all edits on wikis that use RC patrolling. Vandalistic edits are patrolled once dealt with, to clear them from the review queue. There isn't another "rejected" patrol flag. To the extent needed, that is what "reverted" is.)
  • To support that third use case, DerivedPageDataUpdater introduced EditResultCache (via MainStash). Seemingly to avoid re-computing the "is effectively a revert to one of the last 15 revision" boolean. It is unclear why, since this does not appear to be particularly expensive, given it happens synchronously during edits, it is presumably fine to repeat during the relatively infrequent "patrol" action. Anyway, the way that cache works: If while saving an edit we find it is effectively a revert (i.e. restores the page to one of the last 15 revisions), and it was not made by native rollback, and was not a manual undo by trusted user, and is on a wiki with RCPatrol enabled (DerivedPageDataUpdater sets approved to true if RCPatrol is disabled, which is most wikis), then and only then will it will store the EditResult in the MainStash for 30 days (RCMaxAge). This is of course a tiny tiny proportion of edits, likely under <1%.
  • When an edit is manually marked as patrolled (on the wikis that enable RCPatrol), RecentChange::reallyMarkPatrolled checks if EditResultCache contains an entry (where existence implies it was a revert), and if so, queues RevertedTagUpdateJob to add the "reverted" tag.

My understanding of the patch:

  • On every edit, it now stores an EditResult object (i.e. the computed "this was a restore the one of the last 15 revisions" boolean) in the MainStash for 30 days.
  • On every edit, the new ChangeTrackingEventIngress callback performs the "is revert and is approved" logic to decide between queing RevertedTagUpdateJob or doing nothing.
  • On most wikis, this data is never used as it is unreachable with RCPatrol disabled.
  • For 99% of edits, the data is also never used as was known already before saving the data that it was not a revert, and so even if it does get patrolled, it will never gain a "revert" tag.
  • This change seems to break the one reason for which this system exits (use case 3), because RecentChange::reallyMarkPatrolled is now incorrectly queueing RevertedTagUpdateJob jobs after every patrol, instead of only for patrols on edits that were previously known to be a revert.

@aaron @daniel Perhaps you considered it already, but would it make sense to move the EditResult store action to the "do nothing" else branch in the ChangeTrackingEventIngress callback? It seems we already know when we need it when storing it, why not make use of that? Storing false seems redundant since "No, it was not a revert" is already inferred by not storing anything. That's how it used to work basically. From the commit mesage and task, it is not clear what the added storage solves.

@Krinkle thanks for the analysis... I honestly don't know what the percentages are. My assumption was that it would maybe double the writes. From what you are saying, we may expect a x1000 increase. That wouldn't be good.

Perhaps you considered it already, but would it make sense to move the EditResult store action to the "do nothing" else branch in the ChangeTrackingEventIngress callback?

Yes, that was the original plan, but we hit issues with how the cache is used by FlaggedRevs and PageTriage , see the description and discussion on T388573: Use the PageUpdatedEvent to trigger RevertedTagUpdateJob

On most wikis, this data is never used as it is unreachable with RCPatrol disabled.

Right - we can simply not write to the cache on these wikis. We can also omit writing if the edit isn't a revert. But that's only the case if we know in advance that this is the only use case for EditResultCache. This knowledge doesn't really belong into DerivedPageDataUpdater - the point of having the ingress objects is really to move the knowledge there (compare T388573#10627700).

Moving the cache write there doesn't work though, because then it happens post-send, and is unavailable to hook handlers in FlaggedRevs and PageTriage that get executed earlier. That could be fixed by making these extensions listen to the event instead of implement the hook handler.

This change seems to break the one reason for which this system exits (use case 3), because RecentChange::reallyMarkPatrolled is now incorrectly queueing RevertedTagUpdateJob jobs after every patrol

Oh rigzht, that's a bug! It should check isRedirect() on the edit result!

Seemingly to avoid re-computing the "is effectively a revert to one of the last 15 revision" boolean. It is unclear why, since this does not appear to be particularly expensive, given it happens synchronously during edits, it is presumably fine to repeat during the relatively infrequent "patrol" action.

From looking at the job, it seems like that's where the code for examining teh last 15 revisions lives. But it's based on the revisions between "oldest reverted ervision" and "lateste reverted revision" in the EditResult - i.e. it only runs for revisions that we already know are reverted, but cause the rollback action or edit page used PageUpdater::markAsRevert. So it seems like we are not actually auto-detecting reverts. It's not clear to me what the code in the job is supposed to achieve, and wheter it actually achieves it. (EDIT: found class EditResultBuilder::detectManualRevert)

Let's discuss that on T388573. I'll make a patch to mitigate the impact in the meantime.

Change #1130105 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] EditResult: only stash reverts

https://gerrit.wikimedia.org/r/1130105

Also per T259103, although it seems like a misunderstanding in the product design, since 2020 this logic re-runs when an undo by an untrusted user is later marked as patrolled. (The reason this seems like a mistake is that T259103 mistakes patrolling for "approving". Patrolling is an expected "marked as read" end state for all edits on wikis that use RC patrolling. Vandalistic edits are patrolled once dealt with, to clear them from the review queue. There isn't another "rejected" patrol flag. To the extent needed, that is what "reverted" is.)

If we can get a product decision that patrolled-after-the-fact reverts are not tagged, then edit result cache can just be in-process instead of using the main stash. PageTriage wouldn't need to use it since it does not autopatrol. If, furthermore, the autopatrolled status was known by the time onBeforeRevertedTagUpdate runs, the cache could be removed completely. Right now, various FlaggedRevs hook handlers might cause autoreview (which implies patrol), and those hooks rely on the after-the-fact revert tag logic rather than the onBeforeRevertedTagUpdate hook meant for autopatrol. It has to update the new recent changes row to set rc_patrolled=1. Since the ingress class now makes onBeforeRevertedTagUpdate run postsend, it's possible to have FlaggedRevs handle that hook by checking if the revision was autoreviewed in the DB. That would avoid the stash write and extra rc_patrolled=1 write. It would still be be a bit ugly, since it would be checking "did we just write this fr_rev_id row to a table", and, any migration of the autoreview hooks to async domain event listeners would be complicated by order dependencies (maybe they could be synchronous event interceptors).

For now, at least, is seems OK to narrow down when the cache writes are made with some safe checks (given what the cache is used for).

Change #1130105 merged by jenkins-bot:

[mediawiki/core@master] EditResult: only stash reverts

https://gerrit.wikimedia.org/r/1130105

Change #1130769 had a related patch set uploaded (by TrainBranchBot; author: trainbranchbot):

[mediawiki/core@wmf/1.44.0-wmf.22] Branch commit for wmf/1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130769

Change #1130769 merged by jenkins-bot:

[mediawiki/core@wmf/1.44.0-wmf.22] Branch commit for wmf/1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130769

Change #1130958 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] testwikis to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130958

Change #1130958 merged by jenkins-bot:

[operations/mediawiki-config@master] testwikis to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130958

Mentioned in SAL (#wikimedia-operations) [2025-03-25T09:23:52Z] <aklapper@deploy1003> Started scap sync-world: testwikis to 1.44.0-wmf.22 refs T386217

Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:06:18Z] <aklapper@deploy1003> Finished scap sync-world: testwikis to 1.44.0-wmf.22 refs T386217 (duration: 42m 25s)

Change #1130969 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] group0 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130969

Change #1130969 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1130969

Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:30:50Z] <aklapper@deploy1003> rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.22 refs T386217

Change #1131269 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] group1 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131269

Change #1131269 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131269

Mentioned in SAL (#wikimedia-operations) [2025-03-26T09:22:12Z] <aklapper@deploy1003> rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.22 refs T386217

Change #1131291 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] group0 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131291

Change #1131291 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131291

Mentioned in SAL (#wikimedia-operations) [2025-03-26T10:48:00Z] <aklapper@deploy1003> rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.22 refs T386217

Change #1131638 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] group1 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131638

Change #1131638 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131638

Mentioned in SAL (#wikimedia-operations) [2025-03-27T09:14:36Z] <aklapper@deploy1003> rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.22 refs T386217

Change #1131642 had a related patch set uploaded (by TrainBranchBot; author: Andre Klapper):

[operations/mediawiki-config@master] group2 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131642

Change #1131642 merged by jenkins-bot:

[operations/mediawiki-config@master] group2 to 1.44.0-wmf.22

https://gerrit.wikimedia.org/r/1131642

Mentioned in SAL (#wikimedia-operations) [2025-03-27T09:41:32Z] <aklapper@deploy1003> rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.22 refs T386217

After I backported the fix for T390032 and deployed two other backports and deployed to group 1 again and deployed to group2, T390142: PHP Deprecated: Use of CollationCkb::__construct was deprecated in MediaWiki 1.44. [Called from Wikimedia\ObjectFactory\ObjectFactory::getObjectFromSpec] is quite noisy in the logs and affects every page on ckbwiki.
No other explosions though. Yet.

T390142 will get a SWAP deploy later today. Resolving.