Page MenuHomePhabricator

1.42.0-wmf.1 deployment blockers
Closed, ResolvedPublic5 Estimated Story PointsRelease

Details

Backup Train Conductor
hashar
Release Version
1.42.0-wmf.1
Release Date
Oct 16 2023, 12:00 AM

2023 week 42 1.42-wmf.1 Changes wmf/1.42.0-wmf.1

This MediaWiki Train Deployment is scheduled for the week of Monday, October 16th:

Monday October 16thTuesday, October 17thWednesday, October 18thThursday, October 19thFriday
Backports only.Branch wmf.1 and deploy to Group 0 Wikis.Deploy wmf.1 to Group 1 Wikis.Deploy wmf.1 to all Wikis.No deployments on fridays

How this works

  • Any serious bugs affecting wmf.1 should be added as subtasks beneath this one.
  • Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
  • If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
  • If you have a risky change in this week's train add a comment to this task using the Risky patch template
  • For more info about deployment blockers, see Holding the train.

Related Links

Other Deployments

Previous: 1.41.0-wmf.30
Next: 1.42.0-wmf.2

Related Objects

Event Timeline

thcipriani triaged this task as Medium priority.
thcipriani updated Other Assignee, added: hashar.
thcipriani set the point value for this task to 5.
hashar subscribed.

I have deployed a backport fix for TimedMediaHandler ( T348753 )

Risky Patch! 🚂🔥
  • Change: Two Wikibase changes related to the parser cache, Split parser cache by desktop/mobile (for T344362) and Revert "Add hook to invalidate cache entries missing TermboxOption" (for T348872)
  • Summary:
    • The first change splits the parser cache for Wikibase page views between desktop and mobile. Items and properties were already effectively split on this, so this is mainly expected to affect Lexemes, and perhaps MediaInfo (files on Commons) too.
    • As far as we’ve been able to test, old parser cache entries should still be used. The split parser cache only becomes effective when the parser cache is actively updated, e.g. via a purge, a page edit, or because a mobile page view reaches a page that only has a desktop parser cache entry or vice versa.
    • This is risky because, if it unexpectedly does cause old parser cache entries to become unusable, there would (presumably) be a spike in fresh parses.
    • The second change removes a hook handler that was causing many parser cache entries to be rejected on mobile Wikidata and mobile Commons.
    • This is slightly risky because we’re not 100% sure that the hook handler is no longer needed (though it seems likely); it fixed T228978 at the time, so if “Unknown placeholder” errors recur, it’s probably due to this.
  • Test plan:
    • We can test on Test Wikidata (group0) that item and property page views still use old parser cache entries, that T344362#9194424 is not reproducible, and that non-item/property mobile page views start using the parser cache properly.
  • Places to monitor:
  • Revert plan:
    • As far as normal functionality goes, both changes should be alright to revert.
    • If the first change unexpectedly causes a spike in fresh parses, then it’s possible that reverting it will just cause a second spike. I don’t think I can test locally whether this would happen or not, because my local testing predicts no spike for rolling out the change, so if there is a spike in production after all, then clearly my local setup can’t be trusted to make these predictions.
  • Affected wikis:
wikidatawiki
commonswiki
testwikidatawiki
testcommonswiki
  • IRC contact: Lucas_WMDE (but I’ll be offline Wednesday and Thursday), @Michael (probably not on IRC but try Slack), both UTC+2
  • UBN Task Projects/tags: Wikidata Dev Team (Wikidata.org Slice)
  • Would you like to backport this change rather than ride the train?: No, having this rolled out separately to group0 seems better for testing.
Jdforrester-WMF subscribed.

I have deployed a backport fix for TimedMediaHandler ( T348753 )

OK, but that was a blocker for last week's train, not this one. :-)

We also had a risky Parser Cache change that got merged today. The two ParserCache changes are unrelated and affect different instances. Looks like Lukas' one impacts wikidata parser cache use and ours affects Parsoid parser cache use. @daniel @cscott and I will discuss this situation and see how we want to proceed, but just an early alert about this.

Risky Patch! 🚂🔥
  • Change: Make ParsoidOutputAccess a wrapper over ParserOutputAccess
  • Summary:
    • This patch changes (a) code paths used to access Parsoid content from ParserCache (b) changes cache key used to access Parsoid content from ParserCache (c) changes the ParserCache instance used for Parsoid content.
    • As such, (a) this could impact any clients (direct or API) that use Parsoid content for their functionality, namely, VisualEditor, Discussion Tools, Content Translation if there were bugs in this switch. (b) Because of b & c from the above bullet point, Parsoid content accesses could encounter a cold cache scenario and cause performance degradation for these clients. To prevent this, we have pre-deployed this patch to check the old Parsoid ParserCache instance with the old cache key.
  • Test plan:
    • There is sufficient unit test coverage across REST APIs, ParserParser, and ParsoidOutputAccess and ParserOutputAccess classes which took some effort to get passing with the changes which boost our confidence in the correctness of this code
    • Local testing that dumped computed parser cache keys for various page accesses (non-Parsoid and Parsoid content) before and after and verified the accuracy of the keys.
    • Tested on beta cluster by editing pages in VisualEditor with and without editor mode switching and verifying that there were no dirty diffs. Tried this on pages before and after "?action=purge".
    • On deploy to group 0, we will test editing pages on testwiki and mediawikiwiki with various combinations to verify editing behaves as expected.
    • We may do the same on deploys to group 1 and group 2.
  • Places to monitor:
  • Revert plan:
    • It is safe to revert the above patch. (Detailed information for those who care: Once this patch is deployed, 'parsoid' cache will no longer get writes from the wikis where the patch is live. So, there is no corruption involved. The only impact will be performance impacts on some pages that were edited in the interim that will no longer have a parser cache entry in the 'parsoid' cache and as such VE (and API) clients will encounter client latencies the first time those pages are encountered. But, this is not going to be a cold-cache scenario.)
  • Affected wikis:
    • All wikis where this is rolled out to (except commons and wikidata wikis where Parsoid content isn't stored in ParserCache
  • IRC contact: subbu, cscott, duesen
  • UBN Task Projects/tags: #Content Transform Team
  • Would you like to backport this change rather than ride the train?: No, we want to roll this out to group0 wikis and other wikis in a phased manner so we have time to test and catch any issues before a wider rollout.

Change 965578 had a related patch set uploaded (by TrainBranchBot; author: trainbranchbot):

[mediawiki/core@wmf/1.42.0-wmf.1] Branch commit for wmf/1.42.0-wmf.1

https://gerrit.wikimedia.org/r/965578

Change 965578 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.1] Branch commit for wmf/1.42.0-wmf.1

https://gerrit.wikimedia.org/r/965578

Change 966320 had a related patch set uploaded (by TrainBranchBot; author: MediaWiki PreSync):

[operations/mediawiki-config@master] testwikis wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966320

Change 966320 merged by jenkins-bot:

[operations/mediawiki-config@master] testwikis wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966320

Mentioned in SAL (#wikimedia-operations) [2023-10-17T03:02:55Z] <mwpresync@deploy2002> Started scap: testwikis wikis to 1.42.0-wmf.1 refs T348354

Mentioned in SAL (#wikimedia-operations) [2023-10-17T03:53:11Z] <mwpresync@deploy2002> Finished scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 (duration: 50m 15s)

  • We can test on Test Wikidata (group0) that item and property page views still use old parser cache entries,

Desktop page views do, at least (using XWD to bypass Varnish/ATS):

$ curl -H 'X-Wikimedia-Debug: backend=mwdebug1001.eqiad.wmnet' -si https://test.wikidata.org/wiki/Q232508 | grep -iA1 'saved in parser cache'
<!-- Saved in parser cache with key testwikidatawiki:pcache:idhash:326574-0!termboxVersion=1!wb=3 and timestamp 20231012215556 and revision id 656323. Rendering was triggered because: diff-page
 -->
$ curl -H 'X-Wikimedia-Debug: backend=mwdebug1001.eqiad.wmnet' -si https://test.wikidata.org/wiki/Property:P97948 | grep -iA1 'saved in parser cache'
<!-- Saved in parser cache with key testwikidatawiki:pcache:idhash:326558-0!termboxVersion=1!wb=3 and timestamp 20231009135235 and revision id 656130. Rendering was triggered because: page-view
 -->

2023-10-12 and 2023-10-09 are both pre-train. But I don’t think I have an example page available where the mobile parser cache is definitely populated. (I forgot test wikis get the train before the rest of group0 and thought we’d have most of Tuesday to prepare some test pages before the UTC evening train window.)

that T344362#9194424 is not reproducible,

Yup, seems to work fine (I can’t reproduce the bug using those instructions).

and that non-item/property mobile page views start using the parser cache properly.

Tested successfully on Test Wikidata mobile L123 and main page; Test Commons isn’t in testwikis.dblist, so that will only be testable once the train reaches the rest of group0.

So far all good AFAICT. Also, on the occasion of the first MediaWiki 1.42 train:

Macro full-steam-ahead:
Macro raptor-free:

Note: the MediaWiki config has a bug which cases normalized_message to have placeholders replaced by their values. That breaks deduplication on the Kibana dashboard. The reason is an ordering change in mediawiki/core which got deployed last week, that requires a reordering of the MonoLog processors in our config. T349086: MediaWiki normalized_message field has placeholders replaced since October 12th.

The proposed fix is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/966529/

I haven't made it a train blocker cause that is not user facing, it is merely impacting people triaging MediaWiki logs. It would be nice to have it deployed before moving forward with this week train though.

cscott added a subscriber: matmarex.

Adding as a train blocker. @matmarex and I /think/ that the implications of this bug should be limited to testing, and that actual production use of DT will always have a separate request somewhere in the parse->edit->reparse cycle *but* the CI failure in DT will probably block /other/ backports to the train and so this patch is worth backporting.

[RISKY PATCH STATUS]: Here is an update of where we stand since y'day:

  • After Daniel and Scott did an intense review over several weeks (with many intermediate patches written and deployed by the three of us), Daniel merge this patch y'day.
  • Tested VE in beta cluster y'day and looked good.
  • Arlo found that Parsoid CI was broken y'day: T349098: Parsoid no longer used for non-wikitext content models from external REST API captures that and after a bunch of digging and slack discussion, this has now been fixed (Arlo turned off the failing tests in CI which I merged). There are a bunch of followups on the task unrelated to the CI breakage that will be handled separately.
  • Bartosz found that DiscussionTools CI was broken: T349033: DiscussionTools CI is failing. The switch to ParserOutputAccess exposed a bug that he fixed (in discussion with Scott) and that has been backported and deployed and the task has been resolved.
  • Yiannis found that RESTBase CI was broken today: T349087: Redirects on RESTBase testsuite are failing captures the issue and diagnosis and fixes. I verified it fixes the issue on beta cluster. It will be backported to the train soon, I expect.
  • Isabelle (with Scott's input) tested VE and DT testing on testwiki to verify that nothing is broken and we can proceed with rollout to group 0. She found nothing broken or scary. DT testing could use more eyes, but we'll test once it goes out to group 0.
  • I checked with Amir (in a slack conversation) that the potential duplication of Parsoid content (temporarily for a week or two) will not cause disk usage issues. He confirmed that there isn't any.
  • Tangential but related change: Scott uploaded a patch to have FlaggedRevs call the onArticleParserOptions hook (rather than reference the use parsoid config option directly). I merged it and tested it on beta cluster and it looks good. Since FlaggedRevs uses its own parser cache instance, we don't think my patch changes anything there. Since I already did VE testing on beta cluster on a flagged revs page and nothing broke, I think that theory holds. But, something to watch out for after rolling out to group 0. Scott added that we actually broke "useparsoid" for FlaggedRevs wiki last week with a config change and that his patch unbreaks that.

So, based on all this, we are comfortable rolling out the patch to group 0.

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:18:31Z] <brennen> train 1.42.0-wmf.1 (T348354): blockers resolved, rolling to group0

Change 966594 had a related patch set uploaded (by TrainBranchBot; author: Brennen Bearnes):

[operations/mediawiki-config@master] group0 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966594

Change 966594 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966594

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:25:56Z] <brennen@deploy2002> rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.1 refs T348354

@brennen We have a patch here that needs to go out with the next train to avoid validation errors.

We have a patch here that needs to go out with the next train to avoid validation errors.

Thanks for the heads up. I'll make sure this is backported before I roll the train forward to group1.

Mentioned in SAL (#wikimedia-operations) [2023-10-18T18:20:12Z] <brennen> train 1.42.0-wmf.1 (T348354): logs clean and no blockers, rolling to group1

Change 966910 had a related patch set uploaded (by TrainBranchBot; author: Brennen Bearnes):

[operations/mediawiki-config@master] group1 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966910

Change 966910 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/966910

Mentioned in SAL (#wikimedia-operations) [2023-10-18T18:28:11Z] <brennen@deploy2002> rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.1 refs T348354

Mentioned in SAL (#wikimedia-operations) [2023-10-18T18:33:52Z] <brennen@deploy2002> Synchronized php: group1 wikis to 1.42.0-wmf.1 refs T348354 (duration: 05m 40s)

End-of-workday notes: Things generally stable on group1. Per discussion, won't block on T349235 tomorrow.

Change 967262 had a related patch set uploaded (by TrainBranchBot; author: Brennen Bearnes):

[operations/mediawiki-config@master] group2 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/967262

Change 967262 merged by jenkins-bot:

[operations/mediawiki-config@master] group2 wikis to 1.42.0-wmf.1

https://gerrit.wikimedia.org/r/967262

Mentioned in SAL (#wikimedia-operations) [2023-10-19T18:09:16Z] <brennen@deploy2002> rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.1 refs T348354

Optimistically closing this out since we're on all groups.

Keeping an eye on T349235: "InvalidArgumentException: The revision does not belong to the given page." after 1.42.0-wmf.1 deployed to group1 since logs are unhappy about it.

I'll add it as a blocker to next train to ensure it's not lost in the hand off from week-to-week.