1.35.0-wmf.32 deployment blockers
Closed, ResolvedPublicRelease
Actions

Assigned To

Authored By

	thcipriani
	Apr 10 2020, 10:29 PM

Details

Backup Train Conductor: • mmodell
Release Version: 1.35.0-wmf.32
Release Date: May 11 2020, 12:00 AM

2020 week 20 1.35-wmf.32 Changes wmf/1.35.0-wmf.32

This MediaWiki Train Deployment is scheduled for the week of Monday, May 11th:

Monday May 11th	Tuesday, May 12th	Wednesday, May 13th	Thursday, May 14th	Friday
Backports only.	Branch `wmf.32` and deploy to Group 0 Wikis.	Deploy `wmf.32` to Group 1 Wikis.	Deploy `wmf.32` to all Wikis.	No deployments on fridays

See https://wikitech.wikimedia.org/wiki/Deployments for full schedule.

How this works

Any serious bugs affecting wmf.32 should be added as subtasks beneath this one.
- Use this form to create one.
Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
If you have a risky change in this week's train add a comment to this task using the Risky patch template
For more info about deployment blockers, see Holding the train.

Other Deployments

Previous: 1.35.0-wmf.31

Next: 1.35.0-wmf.33

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved	Release	hashar	T249964 1.35.0-wmf.32 deployment blockers
Resolved		Edtadros	T252727 Regression: Plain text sidebar section stopped working in Vector
Resolved	PRODUCTION ERROR	DannyS712	T252963 Fatal TypeError: Argument to SpamChecker::checkSummary() must be of the type string, null given

Event Timeline

thcipriani created this task.Apr 10 2020, 10:29 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 10 2020, 10:29 PM

thcipriani assigned this task to hashar.Apr 10 2020, 10:31 PM

thcipriani triaged this task as Medium priority.

thcipriani updated Other Assignee, added: • mmodell.

thcipriani added a project: Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)).

thcipriani moved this task from INBOX to Deployment on the Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)) board.

Addshore added a subtask: T252079: mw.wikibase.getLabelByLang('Q1','en') returning nil today.May 7 2020, 12:13 PM

@DannyS712 @Pchelolo: hi both! Last train there were a few blockers related to ongoing revision work (thank you for the responsiveness there @DannyS712). I see a few more patches are in master since the 1.35.0-wmf.31 branch cut.

Are there any additional precautions that the train operator could take to lessen the risk of these patches? That is: groups or wikis we should be especially cautious about? Should we ping someone when we're about to roll out? Would it make sense to roll the revision changes separately (somehow, tbd :)) when one or both of you are available, or take some other reasonable precautions? Thanks for your help!

In T249964#6117475, @thcipriani wrote:

@DannyS712 @Pchelolo: hi both! Last train there were a few blockers related to ongoing revision work (thank you for the responsiveness there @DannyS712). I see a few more patches are in master since the 1.35.0-wmf.31 branch cut.

Are there any additional precautions that the train operator could take to lessen the risk of these patches? That is: groups or wikis we should be especially cautious about? Should we ping someone when we're about to roll out? Would it make sense to roll the revision changes separately (somehow, tbd :)) when one or both of you are available, or take some other reasonable precautions? Thanks for your help!

Yeah, sorry about all of the UBNs :(
As for lessening risk, I started looking into requesting logstash access so I could help monitor new Revision issues.
The biggest upcoming patch is probably https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/593272/ - hopefully it'll go out with the next train; perhaps some others reviewing it could help?

As for lessening risk, I started looking into requesting logstash access so I could help monitor new Revision issues.

I'm not sure it works like that. When the issues are hitting production logstash, they're already in production...

In T249964#6117539, @Reedy wrote:

As for lessening risk, I started looking into requesting logstash access so I could help monitor new Revision issues.

I'm not sure it works like that. When the issues are hitting production logstash, they're already in production...

Yes, but it would allow noticing them sooner (eg while only deployed to group0) and on the beta cluster (I think, right?). Anyawy, it was just an idea

In T249964#6117553, @DannyS712 wrote:

In T249964#6117539, @Reedy wrote:

As for lessening risk, I started looking into requesting logstash access so I could help monitor new Revision issues.

I'm not sure it works like that. When the issues are hitting production logstash, they're already in production...

Yes, but it would allow noticing them sooner (eg while only deployed to group0) and on the beta cluster (I think, right?). Anyawy, it was just an idea

Not necessarily; probably not much quicker than Brennen has been solving them. As the train rolls out to more wikis, more edge cases and issues are found. This often happens; it's fine on group0 and group1, but breaks when it hits enwiki etc

Hey @thcipriani yeah... somehow the issue stacked up on this train.

I've been thinking why did that happen as well and can't find anything specific to blame - all the individual patches were as small and confined as always, I didn't even think I should do a headsup on the blockers task about any of them since they all looked like easy straightforward replacements. Nor there was an especially large amount of patches. I apologize for the disturbance.

As for moving forward, I can promise to make deeper reviews, plus I'm planning to bring more reviewers into this cleanup, but I don't think that would necessarily get us completely safe. Adding a test checking every place the code's changed would be ideal, but that will likely expand the scope of the task at hand beyond reasonable size.

I guess one possible approach would be to have manual testing day on group0 - we would be compiling the list of patches going out, look into whether we can test them manually on test.wiki and doing that on Tuesday.

In T249964#6117553, @DannyS712 wrote:

In T249964#6117539, @Reedy wrote:

As for lessening risk, I started looking into requesting logstash access so I could help monitor new Revision issues.

I'm not sure it works like that. When the issues are hitting production logstash, they're already in production...

Yes, but it would allow noticing them sooner (eg while only deployed to group0) and on the beta cluster (I think, right?). Anyawy, it was just an idea

Also https://wikitech.wikimedia.org/wiki/Logstash#Beta_Cluster_Logstash

In T249964#6117528, @DannyS712 wrote:

Yeah, sorry about all of the UBNs :(

Thanks for all the work you've been doing! It seems like it's in an area of code that's tricky to work with — kudos for all your work and responsiveness, it's greatly appreciated.

The biggest upcoming patch is probably https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/593272/ - hopefully it'll go out with the next train; perhaps some others reviewing it could help?

That's a good idea, extra review never hurt :)

In T249964#6117559, @Pchelolo wrote:

Hey @thcipriani yeah... somehow the issue stacked up on this train.

I've been thinking why did that happen as well and can't find anything specific to blame - all the individual patches were as small and confined as always, I didn't even think I should do a headsup on the blockers task about any of them since they all looked like easy straightforward replacements. Nor there was an especially large amount of patches. I apologize for the disturbance.

No worries, this is our method of progressively de-risking: rolling out to successively larger groups, finding bugs during rollout is part of the process. I want to make sure you all have everything you need to deploy these changes confidently.

As for moving forward, I can promise to make deeper reviews, plus I'm planning to bring more reviewers into this cleanup, but I don't think that would necessarily get us completely safe. Adding a test checking every place the code's changed would be ideal, but that will likely expand the scope of the task at hand beyond reasonable size.

Fair.

I guess one possible approach would be to have manual testing day on group0 - we would be compiling the list of patches going out, look into whether we can test them manually on test.wiki and doing that on Tuesday.

We can sync up with you all before rolling to group0 -- if there are some smoke tests we could run there to gain some confidence, that'd be great! Thanks for thinking this through with me :)

After the mess I made last time, I thought it would be helpful to list my patches against core that are going out with this train. So far:

Changes backported to wmf.31

Changes going live with wmf.32

"Remove use of Revision objects in RevisionItem classes" commit
"Fix parameter documentation for dump methods for handling revisions" commit
"ChangeTags::updateTagsWithChecks - remove use of Revision objects" commit
"MovePage::moveToInternal - remove use of Revision::insertOn" commit
T251718: Hard deprecate PageArchive::getPreviousRevision commit

I believe that this is all of the patches, but I may have missed some. This does not include patches against extensions, just core.

Hmm. Not sure if it will cause issues, but RevisionItemBase::getId currently returns string despite being documented to return int. Discovered while adding tests in T252076: Cover RevisionList/RevisionItem classes with tests; patch for that also fixes the method to properly return int. Would it be possible to include this before the branch is cut?

Addshore removed a subtask: T252079: mw.wikibase.getLabelByLang('Q1','en') returning nil today.May 12 2020, 7:35 AM

BEANS-X2 subscribed.May 12 2020, 8:36 AM

Daimona subscribed.May 12 2020, 11:23 AM

@DannyS712 thank you for the summary ;)

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594807/ is worth adding but I don't have any confidence in +2 ing anything in mediawiki/core nowadays.

Mentioned in SAL (#wikimedia-operations) [2020-05-12T12:08:13Z] <hashar> Cutting branch 1.35.0-wmf.32 # T249964

Mentioned in SAL (#wikimedia-operations) [2020-05-12T14:38:03Z] <hashar> 1.35.0-wmf.22 is on test wikis. Will be pushed to group0 later today during the american window (19:00 - 21:00 UTC) # T249964

Added 1.35.0-wmf.31 as a blocker to investigate T249963#6128435

Jdforrester-WMF added a subtask: T247028: Database 'INSERT' query rate doubled (module_deps regression?).May 12 2020, 3:24 PM

Jdforrester-WMF removed a subtask: T249963: 1.35.0-wmf.31 deployment blockers.

Haven't seen any new log messages.

@DannyS712 : there is nothing related to Revision :]]]

Krinkle mentioned this in T252727: Regression: Plain text sidebar section stopped working in Vector.May 13 2020, 11:33 PM

Krinkle added a subtask: T247028: Database 'INSERT' query rate doubled (module_deps regression?).May 13 2020, 11:35 PM

matmarex added a subtask: T252729: Navigation headers in Vector display wrapped in ⧼…⧽, as if the text was a localisation message key.May 14 2020, 12:23 AM

hashar added a subtask: T252803: AffectedPagesFinder: Call to a member function exists() on null.May 14 2020, 5:33 PM

matmarex edited subtasks, added: T252727: Regression: Plain text sidebar section stopped working in Vector; removed: T252729: Navigation headers in Vector display wrapped in ⧼…⧽, as if the text was a localisation message key.May 14 2020, 7:18 PM

hashar closed subtask T252727: Regression: Plain text sidebar section stopped working in Vector as Resolved.May 14 2020, 8:19 PM

Bugreporter added a subtask: T252800: Regression: Option add links in other languages has disappeared.May 14 2020, 8:27 PM

Krinkle added a subtask: T252906: Warning flood: "Use of SkinTemplateToolboxEnd hook was deprecated ".May 15 2020, 6:42 PM

ppelberg mentioned this in T234403: Create a VE-based text input for replying.May 15 2020, 6:56 PM

hashar removed a subtask: T252906: Warning flood: "Use of SkinTemplateToolboxEnd hook was deprecated ".May 18 2020, 1:06 PM

hashar removed a subtask: T252803: AffectedPagesFinder: Call to a member function exists() on null.May 18 2020, 1:18 PM

ppelberg mentioned this in T251963: Test `visual` mode through query string.May 18 2020, 10:33 PM

Krinkle removed a subtask: T247028: Database 'INSERT' query rate doubled (module_deps regression?).May 18 2020, 11:04 PM

Jdlrobson removed a subtask: T252800: Regression: Option add links in other languages has disappeared.May 19 2020, 7:09 PM

I have just found out that Monday 25th is a no deploy day due an holiday
in the US. I will thus push 1.35.0-wmf.32 to all wikis on Tuesday 25th at:

13:00 UTC
06:00 PDT
15:00 CEST

Krinkle added a subtask: T252963: Fatal TypeError: Argument to SpamChecker::checkSummary() must be of the type string, null given.May 24 2020, 10:58 PM

• tstarling closed subtask T252963: Fatal TypeError: Argument to SpamChecker::checkSummary() must be of the type string, null given as Resolved.May 26 2020, 3:20 AM

No new errors occurring so far \o/

We still got hit by the INSERT rate doubling (T247028) but that is not related to the code being deployed, that predates it.

I am keeping this blocker task open meanwhile, will claim it to be a success if we don't have to rollback due to T247028.

There is barely any new errors.

There is a regression with Vector that causes some features to be missing in the language bar but it has been decided it is not worth blocking T252800

The database INSERT rate surged again, but it is not related to the code T252800 . There was at least no database lag this time.

1.35.0-wmf.32 deployment blockersClosed, ResolvedPublicReleaseActions