Page MenuHomePhabricator

1.38.0-wmf.17 deployment blockers
Open, MediumPublic5 Estimated Story PointsRelease

Details

Backup Train Conductor
mmodell
Release Version
1.38.0-wmf.17
Release Date
Mon, Jan 10, 12:00 AM

2022 week 02 1.38-wmf.17 Changes wmf/1.38.0-wmf.17

This MediaWiki Train Deployment is scheduled for the week of Monday, January 10th:

Monday January 10thTuesday, January 11thWednesday, January 12thThursday, January 13thFriday
Backports only.Branch wmf.17 and deploy to Group 0 Wikis.Deploy wmf.17 to Group 1 Wikis.Deploy wmf.17 to all Wikis.No deployments on fridays

How this works

  • Any serious bugs affecting wmf.17 should be added as subtasks beneath this one.
  • Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
  • If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
  • If you have a risky change in this week's train add a comment to this task using the Risky patch template
  • For more info about deployment blockers, see Holding the train.

Related Links

Other Deployments

Previous: 1.38.0-wmf.16
Next: 1.38.0-wmf.18

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thcipriani changed Release Date from Jan 4 2021, 12:00 AM to Tue, Jan 4, 12:00 AM.Oct 21 2021, 12:33 AM
thcipriani changed Release Date from Tue, Jan 4, 12:00 AM to Mon, Jan 10, 12:00 AM.Oct 21 2021, 12:42 AM
thcipriani triaged this task as Medium priority.
thcipriani updated Other Assignee, added: mmodell.
thcipriani set the point value for this task to 5.

Change 752026 had a related patch set uploaded (by 20after4; author: 20after4):

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.16 refs T293958

https://gerrit.wikimedia.org/r/752026

Change 752026 merged by jenkins-bot:

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.16 refs T293958

https://gerrit.wikimedia.org/r/752026

Mentioned in SAL (#wikimedia-operations) [2022-01-06T22:25:15Z] <twentyafterfour@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.16 refs T293958

Risky Patch! 🚂🔥
Risky Patch! 🚂🔥
  • Change: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/740663
  • Summary:
    • Automatic refactor that replaces global variable access with calls to MediaWikiServices::getInstance()->getMainConfig()->get( 'FooBar ) in about 500 places.
    • Trivial change, but touches many files.
    • Local variables are used to replace the global variables. If the name of the new local variable was already used in the respective code, this may cause errors. We were careful to avoid conflicting variable names, but it's still possible that we missed something.
    • If the global variable was written to, the new value will now be lost after the method exists. We were careful to not change code that writes to globals, but it's still possible that we missed something.
    • If getMainConfig()->get( 'FooBar ) is called in a place that gets executed many times, the additional function call overhead may cause performance issues. Since we are doing these calls at the top of the method, this shouldn't happen, but who knows.
  • Test plan:
    • Since this change is holistic across the code base, the best we can do is to rely on general test coverage. The trivial nature of the change should make errors unlikely.
  • Places to monitor:
  • Revert plan:
    • Fix individual line or method. Errors are likely to be due to a problem with a single instance of this change, which is independent of other instances of this change. Manually fixing the code in a single method is probably the easiest solution.
    • If that fails, revert the patch. But conflicts are likely, since this patch touches many files.
    • If reverting the patch fails, roll back the train.
  • Affected wikis: all
  • IRC contact: tchin, duesen. Best ping Daniel and Thomas on the platform-engineering-team channel on Slack.
  • UBN Task Projects/tags: Platform Engineering
Risky Patch! 🚂🔥

Change 753120 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753120

Change 753120 merged by jenkins-bot:

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753120

Mentioned in SAL (#wikimedia-operations) [2022-01-11T19:21:46Z] <dduvall@deploy1002> Started scap: testwikis wikis to 1.38.0-wmf.17 refs T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-11T20:01:24Z] <dduvall@deploy1002> Finished scap: testwikis wikis to 1.38.0-wmf.17 refs T293958 (duration: 39m 38s)

Change 753138 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753138

Change 753138 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753138

Mentioned in SAL (#wikimedia-operations) [2022-01-11T20:38:30Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17 refs T293958

Risky Patch! 🚂🔥

I did see a small number of replication lag related errors today following group0 deployment. Here's one example. @daniel would you mind verifying whether this is related before tomorrow's deployment window. I'll check back in the morning as well and see if it warrants a task.

Mentioned in SAL (#wikimedia-operations) [2022-01-11T23:04:14Z] <dduvall> syncing backport to fix VE regression that followed testwiki/group0 deployment (cc T293958)

I did see a small number of replication lag related errors today following group0 deployment. Here's one example. @daniel would you mind verifying whether this is related before tomorrow's deployment window. I'll check back in the morning as well and see if it warrants a task.

I see an increase in DBPerformance warnings: https://logstash.wikimedia.org/goto/4cfaf42b6b1c23486bb0361d3820fbf9. But they do not seem to come from grop0 sites: https://logstash.wikimedia.org/goto/1e88fa4f9351a34096aff56640039da2

@Ladsgroup had a look and found that the spike is likely caused by a large number of account creations on meta: https://logstash.wikimedia.org/goto/b7ee69dcbff9b3ae68090dbde78e63a8.

But this doesn't correlate with the read-only errors: https://logstash.wikimedia.org/goto/d882ea6b8d6fe51fec9dcf737d8ec97c. I'm only seeing four of these, so maybe it's just random.

Thanks for looking into that, @daniel ! I always err greatly on the side of paranoia during train. :)

Change 753554 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753554

Change 753554 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753554

Mentioned in SAL (#wikimedia-operations) [2022-01-12T20:19:48Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-12T20:21:10Z] <dduvall@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 21s)

Mentioned in SAL (#wikimedia-operations) [2022-01-12T20:36:55Z] <dduvall> 1.38.0-wmf.17 rolled back from group1 due to large spike in db read-only errors and slow queries (T293958)

Risky Patch! 🚂🔥

@daniel @tstarling please see T299095: Links tables corrupted due to incorrectly parenthesized delete queries. I believe this may be related.

Change 753812 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753812

Change 753812 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753812

Mentioned in SAL (#wikimedia-operations) [2022-01-13T20:07:11Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17 refs T293958

Change 753814 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753814

Change 753814 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753814

Mentioned in SAL (#wikimedia-operations) [2022-01-13T20:16:47Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-13T20:17:54Z] <dduvall@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2022-01-13T20:28:59Z] <dduvall> rolling back wmf.17 from group1 due to a large increase in "Parser state cleared while parsing" across commons and group1 wikipedias (T293958, T299149)

Change 753860 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753860

Change 753860 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753860

Mentioned in SAL (#wikimedia-operations) [2022-01-14T00:13:57Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-14T00:15:04Z] <dduvall@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 06s)

Change 753862 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753862

Change 753862 merged by jenkins-bot:

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/753862

Mentioned in SAL (#wikimedia-operations) [2022-01-14T00:23:08Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs T293958

T299191

Can't remember where I'm supposed to report possible blockers, but it seems serious.

There is a report on enwp's VPT that category counts are wrong and not updating properly, affecting deletion processes. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Problems_with_speedy_deletion_category_counts. I'm wondering if this is related to the LinksUpdate refactor

I first noticed category update problems on Commons on 5 January, but didn't pay much mind to it as I've become desensitized toward category update problems on Commons. Specifically, category updates were displayed on File pages but not on the Category page, even though they were in Special:WhatLinksHere for the cat. The numbers matched what the Category page showed.

So we don't currently have a fix for T299244 but we also aren't comfortable reverting the branch at this point due to several complex changes that landed this week. The people who wrote the code aren't around and neither are any release engineers (I have to leave shortly)

We've text-messaged @tstarling but it's saturday morning where he lives so I can't be sure when/if Tim will respond here on the weekend.

I think it's best to leave things as-is until someone who is familiar with the code can submit a patch to either work-around the bug or revert the change.

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:46:37Z] <jforrester@deploy1002> Finished scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 (duration: 03m 58s)

Change 754051 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/754051

Change 754051 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/754051

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:51:39Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:52:32Z] <dduvall@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 00m 52s)

Change 754053 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/754053

Change 754053 merged by jenkins-bot:

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.17 refs T293958

https://gerrit.wikimedia.org/r/754053

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:57:59Z] <dduvall@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs T293958

@ShakespeareFan00 is reporting that linting reports are not updating. They give one example: this report still contains report of an error after page83's transclusion 83 which should have been fixed after this change.

Looking at the jobqueue, I can see a significant change in the recordLintJob rates after friday Jan 14, 02:00 AM (UTC)

Screenshot 2022-01-15 at 20.46.06.png (688×1 px, 311 KB)

Additionally, I notice there are 60Million Wikibase jobs (slowly climbing over the last 2 weeks) ???

Screenshot 2022-01-15 at 20.48.44.png (614×1 px, 105 KB)

@Izno reports "search is now starting to slow its updates too"

@ShakespeareFan00 is reporting that linting reports are not updating. They give one example: this report still contains report of an error after page83's transclusion 83 which should have been fixed after this change.

Looking at the jobqueue, I can see a significant change in the recordLintJob rates after friday Jan 14, 02:00 AM (UTC)

Screenshot 2022-01-15 at 20.46.06.png (688×1 px, 311 KB)

The train hit group2 at 00:23, so it's probably related. I suspect the Linter job increase is because of T297443: Add a linter category for inline images with captions, which added a new lint category, so instead of most Linter jobs being fast no-ops they're now writing rows to the database. Over time this will recover by itself, but in the meantime we can add more RecordLintJob runners in changeprop (Monday I guess).

The Wikibase/search issues seem pretty concerning, no clue about that.