1.38.0-wmf.12 deployment blockers
Closed, ResolvedPublic5 Estimated Story PointsRelease
Actions

Assigned To

Authored By

	thcipriani
	Oct 21 2021, 12:13 AM

Details

Backup Train Conductor: brennen
Release Version: 1.38.0-wmf.12
Release Date: Dec 6 2021, 12:00 AM

2021 week 49 1.38-wmf.12 Changes wmf/1.38.0-wmf.12

This MediaWiki Train Deployment is scheduled for the week of Monday, December 6th:

Monday December 6th	Tuesday, December 7th	Wednesday, December 8th	Thursday, December 9th	Friday
Backports only.	Branch `wmf.12` and deploy to Group 0 Wikis.	Deploy `wmf.12` to Group 1 Wikis.	Deploy `wmf.12` to all Wikis.	No deployments on fridays

See https://wikitech.wikimedia.org/wiki/Deployments for full schedule.

How this works

Any serious bugs affecting wmf.12 should be added as subtasks beneath this one.
- Use this form to create one.
Any open subtask(s) block the train from moving forward. This means no further deployments until the blockers are resolved.
If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
If you have a risky change in this week's train add a comment to this task using the Risky patch template
For more info about deployment blockers, see Holding the train.

Other Deployments

Previous: 1.38.0-wmf.11

Next: 1.38.0-wmf.13

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved	Release	• dancy	T293953 1.38.0-wmf.12 deployment blockers
Resolved		daniel	T296063 4x increase in database queries after deploy of 1.38.0-wmf.9 to all wikis
Resolved	BUG REPORT	Krinkle	T296639 Less_Exception_Parser: File `resources/lib/ooui/wikimedia-ui-base.less` not found. in ve.ui.CodeMirror.init.less
Resolved	PRODUCTION ERROR	Zabe	T297318 PHP Notice: Undefined index: enable-toc

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

T297066 seems to be a UI regression in WikibaseMediaInfo, I’ll let the SDC developers or train conductors decide if it should block the train. (It would only affect Commons, i.e. group1.)

Edit: This was resolved in time for the wmf.12 branch cut, no action necessary.

LucasWerkmeister unsubscribed.Dec 5 2021, 3:23 PM

ItamarWMDE subscribed.Dec 6 2021, 12:58 PM

Agusbou2015 subscribed.Dec 6 2021, 3:55 PM

Just so this doesn't get lost in the comment above, if the train is rolled back after deploy, we will need to purge RESTBase content (See T296425) to ensure VE edits on metawiki and mediawiki on pages with <translate> tags don't break.

This is admittedly an edge case since VE editing on pages with <translate> isn't well-supported currently and so editors are unlikely to use VE on those pages. So, if a train is going to be rolled back and then rolled forward relatively quickly, there is probably some breathing room to let that be in a broken state for a while. But, if the train is going to be in a rolled back state for a long time, then, it is worth purging RESTBase storage via T296425.

Thanks ssastry.

In T293953#7550675, @ssastry wrote:

Just so this doesn't get lost in the comment above, if the train is rolled back after deploy, we will need to purge RESTBase content (See T296425) to ensure VE edits on metawiki and mediawiki on pages with <translate> tags don't break.

This is admittedly an edge case since VE editing on pages with <translate> isn't well-supported currently and so editors are unlikely to use VE on those pages. So, if a train is going to be rolled back and then rolled forward relatively quickly, there is probably some breathing room to let that be in a broken state for a while. But, if the train is going to be in a rolled back state for a long time, then, it is worth purging RESTBase storage via T296425.

Thanks for this note! It seems likely that we will rollback at least once during this train as (a) we didn't do train last week and (b) there are so many risky patches on this train.

Is there any way to hide this incompatible behavior until the rollout of this train is complete (e.g., via a feature flag or similar)?

I'll note that this train will have a large amount of patches in the CentralAuth extension. The extension in general is risky (old and complicated codebase, little to no tests, tasked with critical things like authentication) so if you see or hear about any strange behaviour please feel free to rollback / revert things on a low treshold. Thanks!

Thank you Majavah.

In T293953#7550888, @thcipriani wrote:

In T293953#7550675, @ssastry wrote:

Just so this doesn't get lost in the comment above, if the train is rolled back after deploy, we will need to purge RESTBase content (See T296425) to ensure VE edits on metawiki and mediawiki on pages with <translate> tags don't break.

This is admittedly an edge case since VE editing on pages with <translate> isn't well-supported currently and so editors are unlikely to use VE on those pages. So, if a train is going to be rolled back and then rolled forward relatively quickly, there is probably some breathing room to let that be in a broken state for a while. But, if the train is going to be in a rolled back state for a long time, then, it is worth purging RESTBase storage via T296425.

Thanks for this note! It seems likely that we will rollback at least once during this train as (a) we didn't do train last week and (b) there are so many risky patches on this train.

Is there any way to hide this incompatible behavior until the rollout of this train is complete (e.g., via a feature flag or similar)?

I'll chat with others on the team about this and see what seems feasible.

We had that as a fallback for next week for scenarios where the rollback was because of a bug in our code (the plan was to revert our code, roll train forward, and try again next week). But, yes, there could be rollbacks because of reasons not related to our code.

I also dropped a note on metawiki to alert translate admins to keep an eye out.

The volume of VE edits on metawiki is quite low. That seems to be true for mediawiki.org VE edits as well.

So, given all that, at this time, I feel comfortable that we can handle this by keeping an eye on VE edits and reverting / fixing them appropriately and rely on RESTBase content purge only if we really need it - for example if we find that Parsoid's support for translate itself is broken which might require a roll back, Parsoid revert, and train roll forward.

I looked at wmf-config to make sure there weren't other wikis, and commons is the other major wiki that uses translate tags. I looked at namespaces that commonly receive VE edits (File, Category, User) and there are only 69 pages that have translate tags in them in those namespaces.

Overall, as @Arlolra noted elsewhere, it would have been cleaner and better to have rolled out HTML->WT support for Parsoid HTML version 2.4.0 so that we have forward compatibility for that version *before* the Parsoid HTML version bump to 2.4.0 in this train.

But, all things considered, between metawiki, mediawiki, and commons, as noted in these comments, the intersection of

pages with translate tags on them
pages being edited with VE

is a small number that this feels acceptable to move ahead without needing to rely on RESTBase purges when the train rolls back.

All that said, if train operators prefer that we mitigate that for your sanity, I will happily work with my team to make those fixes.

In T293953#7551761, @ssastry wrote:

All that said, if train operators prefer that we mitigate that for your sanity, I will happily work with my team to make those fixes.

From what you've said the number of impacted pages seems small, but I don't quite grok the impact to those pages: would editing be broken on those pages while we're in a rollback state and would the be fixed when we roll forward? Or does this cause some kind of corruption?

If it's possible to mitigate this, that'd be ideal. As an alternative, monitoring the rollout and having someone on hand for problems would work as well.

In T293953#7551832, @thcipriani wrote:

In T293953#7551761, @ssastry wrote:

All that said, if train operators prefer that we mitigate that for your sanity, I will happily work with my team to make those fixes.

From what you've said the number of impacted pages seems small, but I don't quite grok the impact to those pages: would editing be broken on those pages while we're in a rollback state and would the be fixed when we roll forward? Or does this cause some kind of corruption?

VE edits on those pages can cause page corruption *If* the page had been reparsed (or purged) between train rollout and train rollback.

If it's possible to mitigate this, that'd be ideal. As an alternative, monitoring the rollout and having someone on hand for problems would work as well.

Yes, someone will be around.

Ladsgroup subscribed.Dec 7 2021, 10:45 AM

Note that the blocker is in a weird state, it seems that there are several issues there. I think the db part is fixed but the memory part is not. Needs better investigation.

In T293953#7551832, @thcipriani wrote:

If it's possible to mitigate this, that'd be ideal.

We are working to take Parsoid out of the risky patches set and should have a backport to vendor before the train rolls out. We decided to split the 2.3.0 -> 2.4.0 bump across 2 trains. That will eliminate the need for RESTBase purges altogether without needing to keep any eye on individual edits on those 3 wikis.

• dancy mentioned this in T297066: Unable to add multiple statements at once in Structured Data on Commons.Dec 7 2021, 4:36 PM

The Parsoid "backport" is https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/744800. I assume that since the train hasn't rolled yet it's safe to merge that onto wmf.12, but I recall ops needing to specially sync stuff to make everything work correctly. Again: what we've done is turn off Translate annotations in Parsoid 0.15.0-a12 in order to make the Parsoid deploy "not risky" this week. We'll wait to turn on our risky feature, either next week or next year.

EDIT: Got confirmation from @dancy that it was safe to merge on the wmf.12 branch because the train hadn't been checked out yet; also updated the docs at https://wikitech.wikimedia.org/wiki/Parsoid#If_the_train_branch_has_already_been_cut to record that we should always get explicit clearance in #wikimedia-operations before merging a backport "after the cut" like this.

• dancy added a subtask: T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active.Dec 7 2021, 6:09 PM

EBernhardson closed subtask T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active as Resolved.Dec 7 2021, 6:47 PM

• dancy removed a subtask: T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active.Dec 7 2021, 7:27 PM

In T293953#7553199, @Ladsgroup wrote:

Note that the blocker is in a weird state, it seems that there are several issues there. I think the db part is fixed but the memory part is not. Needs better investigation.

I have no idea how to investigate this... I propose to roll forward and to look out for the memory issue showing up again.

Hm... I made LinkCache cache some more fields from the page table. Most of that cache is only in-process, so should not persist. Some of them are however written to WANObjectCache. But that shouldn't affect memory usage on app servers, right?

daniel closed subtask T296063: 4x increase in database queries after deploy of 1.38.0-wmf.9 to all wikis as Resolved.Dec 7 2021, 8:31 PM

In T293953#7553328, @ssastry wrote:

In T293953#7551832, @thcipriani wrote:

If it's possible to mitigate this, that'd be ideal.

We are working to take Parsoid out of the risky patches set and should have a backport to vendor before the train rolls out. We decided to split the 2.3.0 -> 2.4.0 bump across 2 trains. That will eliminate the need for RESTBase purges altogether without needing to keep any eye on individual edits on those 3 wikis.

Thanks for raising the concern and for the extra care you're taking. I appreciate it <3

Change 744878 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/744878

Change 744878 merged by jenkins-bot:

[operations/mediawiki-config@master] testwikis wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/744878

Mentioned in SAL (#wikimedia-operations) [2021-12-07T21:23:43Z] <dancy@deploy1002> Started scap: testwikis wikis to 1.38.0-wmf.12 refs T293953

brennen added a project: User-brennen.Dec 7 2021, 9:40 PM

brennen moved this task from Backlog to Doing on the User-brennen board.

brennen subscribed.

Mentioned in SAL (#wikimedia-operations) [2021-12-07T22:07:58Z] <dancy@deploy1002> Finished scap: testwikis wikis to 1.38.0-wmf.12 refs T293953 (duration: 44m 14s)

Change 744886 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/744886

Change 744886 merged by jenkins-bot:

[operations/mediawiki-config@master] group0 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/744886

Mentioned in SAL (#wikimedia-operations) [2021-12-07T22:15:24Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.12 refs T293953

brennen mentioned this in T297247: Less_Exception_Parser: File `resources/lib/ooui/wikimedia-ui-base.less` not found. in ve.ui.CodeMirror.init.less.Dec 7 2021, 10:31 PM

• dancy mentioned this in T297248: TypeError: Argument 1 passed to GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::hasMinimumWidth() must be of the type integer, null given, called in /srv/mediawiki/php-1.38.0-wmf.12/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php on line 242.Dec 7 2021, 10:39 PM

Krinkle added a subtask: T296639: Less_Exception_Parser: File `resources/lib/ooui/wikimedia-ui-base.less` not found. in ve.ui.CodeMirror.init.less.Dec 7 2021, 10:42 PM

Michael subscribed.Dec 8 2021, 8:32 AM

brennen moved this task from Next to Doing on the Release-Engineering-Team board.Dec 8 2021, 4:08 PM

brennen edited projects, added Release-Engineering-Team (Doing); removed Release-Engineering-Team (Next).

hashar subscribed.Dec 8 2021, 4:34 PM

• taavi closed subtask T296639: Less_Exception_Parser: File `resources/lib/ooui/wikimedia-ui-base.less` not found. in ve.ui.CodeMirror.init.less as Resolved.Dec 8 2021, 4:34 PM

Change 745313 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745313

Change 745313 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745313

Mentioned in SAL (#wikimedia-operations) [2021-12-08T20:16:25Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.12 refs T293953

Mentioned in SAL (#wikimedia-operations) [2021-12-08T20:17:30Z] <dancy@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.12 refs T293953 (duration: 01m 05s)

Change 745315 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.9 refs T293953

https://gerrit.wikimedia.org/r/745315

Change 745315 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.9 refs T293953

https://gerrit.wikimedia.org/r/745315

Mentioned in SAL (#wikimedia-operations) [2021-12-08T20:21:27Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9 refs T293953

Mentioned in SAL (#wikimedia-operations) [2021-12-08T20:22:31Z] <dancy@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.9 refs T293953 (duration: 01m 04s)

brennen mentioned this in T297318: PHP Notice: Undefined index: enable-toc.Dec 8 2021, 8:50 PM

• dancy added a subtask: T297318: PHP Notice: Undefined index: enable-toc.Dec 8 2021, 8:58 PM

Ladsgroup closed subtask T297318: PHP Notice: Undefined index: enable-toc as Resolved.Dec 8 2021, 9:31 PM

Change 745331 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745331

Change 745331 merged by jenkins-bot:

[operations/mediawiki-config@master] group1 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745331

Mentioned in SAL (#wikimedia-operations) [2021-12-08T21:41:52Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.12 refs T293953

Mentioned in SAL (#wikimedia-operations) [2021-12-08T21:42:56Z] <dancy@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.12 refs T293953 (duration: 01m 04s)

Hi everybody,

there is an alert for Eventgate Analytics External event validation errors for the schema mediawiki.mediasearch_interaction, that lines up with the deployment timings:

https://sal.toolforge.org/log/xYr_m30B8Fs0LHO53sGB
Grafana dashboard

Logstash shows the same error over and over: '.search_result_page_id' should be integer

BTullis subscribed.Dec 9 2021, 10:07 AM

• dancy mentioned this in T297400: '.search_result_page_id' should be integer.Dec 9 2021, 3:51 PM

• dancy added a subtask: T297400: '.search_result_page_id' should be integer.Dec 9 2021, 3:54 PM

In T293953#7558703, @elukey wrote:

Hi everybody,

there is an alert for Eventgate Analytics External event validation errors for the schema mediawiki.mediasearch_interaction, that lines up with the deployment timings:

https://sal.toolforge.org/log/xYr_m30B8Fs0LHO53sGB
Grafana dashboard

Logstash shows the same error over and over: '.search_result_page_id' should be integer

Thanks for the report. I filed a separate task for that (T297400) and set it as a train blocker.

Catrope added a subtask: T297421: Search results are not clickable.Dec 9 2021, 6:10 PM

Catrope removed a subtask: T297421: Search results are not clickable.

• dancy removed a subtask: T297400: '.search_result_page_id' should be integer.Dec 9 2021, 7:33 PM

Change 745594 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] group2 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745594

Change 745594 merged by jenkins-bot:

[operations/mediawiki-config@master] group2 wikis to 1.38.0-wmf.12 refs T293953

https://gerrit.wikimedia.org/r/745594

Mentioned in SAL (#wikimedia-operations) [2021-12-09T20:04:19Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.12 refs T293953

brennen mentioned this in T297431: InvalidArgumentException: The revision does not belong to the given page..Dec 9 2021, 9:03 PM

• dancy added a subtask: T297431: InvalidArgumentException: The revision does not belong to the given page..Dec 9 2021, 9:49 PM

matmarex removed a subtask: T297431: InvalidArgumentException: The revision does not belong to the given page..Dec 10 2021, 5:32 PM

Change 745955 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.9 refs T293953

https://gerrit.wikimedia.org/r/745955

Change 745955 merged by jenkins-bot:

[operations/mediawiki-config@master] all wikis to 1.38.0-wmf.9 refs T293953

https://gerrit.wikimedia.org/r/745955

Mentioned in SAL (#wikimedia-operations) [2021-12-10T22:09:48Z] <dancy@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9 refs T293953

The train has been rolled back to wmf.9 due to appserver and parsoid server memory growth. @RLazarus and others are investigating in IRC #wikimedia-operations.

Zabe subscribed.Dec 10 2021, 10:16 PM

Mentioned in SAL (#wikimedia-operations) [2021-12-13T18:25:56Z] <dancy> dancy@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.12 T293953

Rolled forward to wmf.12 after https://gerrit.wikimedia.org/r/746909 was deployed.

jsn.sherman mentioned this in T288070: Deploy The Wikipedia Library Echo notification.Dec 13 2021, 6:49 PM

brennen removed a subtask: T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.Dec 14 2021, 12:55 AM

There's a possible regression at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Edit_from_%22What_links_here%22, maybe someone familiar with recent changes to editing stuff might have an idea if that's a real regression or not.

@Legoktm I bisected, caused by rMWef458e894888: Replace deprecated methods IContextSource::getWikiPage && IContextSource… https://gerrit.wikimedia.org/r/c/mediawiki/core/+/702614

Filed as T297744. I'm not sure if this is necessarily a blocker. It's inconvenient but probably not a huge issue.

matmarex mentioned this in T297744: Page tabs like "Edit", "View history" do not appear on Special:WhatLinksHere.Dec 14 2021, 6:59 PM

In T293953#7570762, @matmarex wrote:

Filed as T297744. I'm not sure if this is necessarily a blocker. It's inconvenient but probably not a huge issue.

Agreed, I don't think we should roll back wmf.12 or hold up wmf.13 on this issue, but fixing it would be good.

thcipriani closed this task as Resolved.Dec 20 2021, 10:02 PM

1.38.0-wmf.12 deployment blockersClosed, ResolvedPublic5 Estimated Story PointsReleaseActions

Details

2021 week 49 1.38-wmf.12 Changes wmf/1.38.0-wmf.12

How this works

Related Links

Other Deployments

Related ObjectsSearch...

Event Timeline

1.38.0-wmf.12 deployment blockers
Closed, ResolvedPublic5 Estimated Story PointsRelease
Actions

Related Objects
Search...