Page MenuHomePhabricator

1.38.0-wmf.5 deployment blockers
Closed, ResolvedPublic5 Estimated Story PointsRelease

Details

Backup Train Conductor
dancy
Release Version
1.38.0-wmf.5
Release Date
Oct 18 2021, 12:00 AM

2021 week 42 1.38-wmf.5 Changes wmf/1.38.0-wmf.5

This MediaWiki Train Deployment is scheduled for the week of Monday, October 18th:

Monday October 18thTuesday, October 19thWednesday, October 20thThursday, October 21stFriday
Backports only.Branch wmf.5 and deploy to Group 0 Wikis.Deploy wmf.5 to Group 1 Wikis.Deploy wmf.5 to all Wikis.No deployments on fridays

How this works

  • Any serious bugs affecting wmf.5 should be added as subtasks beneath this one.
  • Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
  • If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
  • If you have a risky change in this week's train add a comment to this task using the Risky patch template
  • For more info about deployment blockers, see Holding the train.

Related Links

Other Deployments

Previous: 1.38.0-wmf.4
Next: 1.38.0-wmf.6

Related Objects

Event Timeline

thcipriani renamed this task from 1.37.0-wmf.28 deployment blockers to 1.38.0-wmf.5 deployment blockers.Aug 25 2021, 4:40 PM
thcipriani assigned this task to hashar.
thcipriani triaged this task as Medium priority.
thcipriani changed Release Version from 1.37.0-wmf.28 to 1.38.0-wmf.5.
thcipriani updated Other Assignee, added: dancy.
thcipriani set the point value for this task to 5.

@thcipriani I've missed a few train log triages, but we did say that newly introduced bugs that show up on the mw-client-errors dashboard or mw-client-error editing dashboard at a rate of over 1000 errors in a 12 hr period should currently be judged as train blockers [1].

It looks like T291392 was introduced on 22nd September, but didn't block that week's train and it still persists at high volume: https://logstash.wikimedia.org/goto/8d80bf34a74e381f84429c0dabaf14d2 ,

My understanding is a bug fix is in code review so it doesn't necessarily need to halt this week's train, but I think it should definitely block next week's train, so that we don't set a precedent that new bugs at high volume are tolerated.

Screen Shot 2021-10-18 at 11.22.11 AM.png (912×2 px, 296 KB)

[1] https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train

I have to admit that I think we've been dropping the ball on triaging client errors. Or at least I personally have, during my last several trains.

On reflection, I think tracking the client error dashboard as a separate thing may just be one input channel more than train conductors are going to be very good at juggling. This ticket probably isn't the place, but I'd like it if we could work out a way for that responsibility not to fall on one or two individuals in RelEng.

I have to admit that I think we've been dropping the ball on triaging client errors. Or at least I personally have, during my last several trains.

On reflection, I think tracking the client error dashboard as a separate thing may just be one input channel more than train conductors are going to be very good at juggling. This ticket probably isn't the place, but I'd like it if we could work out a way for that responsibility not to fall on one or two individuals in RelEng.

tracked at T293694: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period

Acknowledging that I am indeed running the train this week.

My first thought has been it is a bit unfair to block this week train for an issue that started almost a month ago. Then making it a blocker will ensure it get fixed and make sure we treat client side errors as first class citizens when it comes to block the train.

I will push to group 0 on Tuesday, if any backport is needed, I will be happy to do them :]

I don't have much experience with the client side error dashboard or javascript trace, I will surely do my best and fill anything that looks suspicious or spammy. Thank you @Jdlrobson !

Hey all,

Thanks for @Jdlrobson for flagging this up at T291392.

A larger patch that was originally going out last week but got reverted, will fix these production errors and should now be ready at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/730208. The issue only affects commons so there should be no issue in deploying to Group0 tomorrow. I will try and get this merged and backported Tuesday UTC to wmf.4 and wmf.5

Mentioned in SAL (#wikimedia-operations) [2021-10-19T09:27:10Z] <hashar> Cloned and applied security patches for 1.38.0-wmf.5 # T281169

Mentioned in SAL (#wikimedia-operations) [2021-10-19T09:27:45Z] <hashar> sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3 # T281169

Pending for patches from T293735 to be merged by CI.

We should verify on beta it is all fixed before rolling out.

Change 731972 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/731972

Change 731972 merged by jenkins-bot:

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/731972

Mentioned in SAL (#wikimedia-operations) [2021-10-19T13:26:23Z] <hashar@deploy1002> Started scap: testwikis wikis to 1.38.0-wmf.5 refs T281169

Mentioned in SAL (#wikimedia-operations) [2021-10-19T14:11:36Z] <hashar@deploy1002> Finished scap: testwikis wikis to 1.38.0-wmf.5 refs T281169 (duration: 45m 13s)

1.38.0-wmf.5 is now on testwikis, I ran out of time to promote group 0 wikis that will happen later tonight.

Change 732035 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732035

Change 732035 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732035

Mentioned in SAL (#wikimedia-operations) [2021-10-19T19:02:59Z] <hashar@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5 refs T281169

image.png (609×1 px, 70 KB)

@thcipriani I've missed a few train log triages, but we did say that newly introduced bugs that show up on the mw-client-errors dashboard or mw-client-error editing dashboard at a rate of over 1000 errors in a 12 hr period should currently be judged as train blockers [1].

It looks like T291392 was introduced on 22nd September, but didn't block that week's train and it still persists at high volume: https://logstash.wikimedia.org/goto/8d80bf34a74e381f84429c0dabaf14d2 ,

My understanding is a bug fix is in code review so it doesn't necessarily need to halt this week's train, but I think it should definitely block next week's train, so that we don't set a precedent that new bugs at high volume are tolerated.

Screen Shot 2021-10-18 at 11.22.11 AM.png (912×2 px, 296 KB)

[1] https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train

This comment was removed by Seddon.

Thank you @Seddon for the verification.

I have triaged logs this morning and it looks really quiet. I will push to group1 slightly after 13:00 UTC (15:00 CEST).

Change 732316 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732316

Change 732316 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732316

Mentioned in SAL (#wikimedia-operations) [2021-10-20T13:20:51Z] <hashar@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5 refs T281169

Mentioned in SAL (#wikimedia-operations) [2021-10-20T13:21:54Z] <hashar@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.5 refs T281169 (duration: 01m 02s)

ParserOutput::getProperty and ParserOutput::setProperty have been marked deprecated and a few code path paths still hit them. I have filed T293860, T293895, T293894 to get them addressed. Given they are deprecation notices, I am not making them blocker of this week train.

@Seddon and @Jdlrobson I tried to load the mw-client-NEW-errors dashboard at https://logstash.wikimedia.org/app/dashboards#/view/AXDBY8Qhh3Uj6x1zCF56 but it keeps giving errors. But the unfiltered view looks good.

@Seddon and @Jdlrobson I tried to load the mw-client-NEW-errors dashboard at https://logstash.wikimedia.org/app/dashboards#/view/AXDBY8Qhh3Uj6x1zCF56 but it keeps giving errors. But the unfiltered view looks good.

Fixed it. Some filters had been enabled that shouldn't have been. Looks fixed to me!

Screen Shot 2021-10-20 at 8.37.40 AM.png (734×2 px, 131 KB)

@Jdlrobson thank you for

The ParserOutput:getProperty deprecations ( T293860 T293895 T293894 ) all had patches, they are in the CI pipe and I will deploy them once they get merged.

THe logo has disappeared on some mobile sites so please do not deploy further until T290525 is taken care of.

Change 732678 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732678

Change 732678 merged by jenkins-bot:

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.5 refs T281169

https://gerrit.wikimedia.org/r/732678

Mentioned in SAL (#wikimedia-operations) [2021-10-21T13:04:14Z] <hashar@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5 refs T281169

I spotted a few errors which seem to be user triggered rather than a generic issue. They each had a single occurrence: T294015 , T294017 and T294020

Assuming the train is fine. Note I am not available this evening, follow up with Release-Engineering-Team if there is any action required such as a rollback, but it seems quiet :)