Page MenuHomePhabricator

1.36.0-wmf.11 deployment blockers
Closed, ResolvedPublicRelease

Description

As of October 5th 9:30 UTC, the train is blocked due to user authentication issue

Public taskT264370: User authentication security issue (Oct 1)
Restricted task{T264369}

Details

Release Version
1.36.0-wmf.11
Release Date
Sep 28 2020, 12:00 AM
Backup Conductor
hashar

2020 week 40 1.36-wmf.11 Changes wmf/1.36.0-wmf.11

This MediaWiki Train Deployment is scheduled for the week of Monday, September 28th:

Monday September 28thTuesday, September 29thWednesday, September 30thThursday, October 1stFriday
Backports only.Branch wmf.11 and deploy to Group 0 Wikis.Deploy wmf.11 to Group 1 Wikis.Deploy wmf.11 to all Wikis.No deployments on fridays

How this works

  • Any serious bugs affecting wmf.11 should be added as subtasks beneath this one.
  • Any open subtask(s) block the train from moving forward.This means no further deployments until the blockers are resolved.
  • If something is serious enough to warrant a rollback then you should bring it to the attention of deployers on the #wikimedia-operations IRC channel.
  • For more info about deployment blockers, see Holding the train.

Related Links

Other Deployments

Previous: 1.36.0-wmf.10
Next: 1.36.0-wmf.12

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2020, 7:29 PM

Risky Patch! 🚂🔥

wikidatawiki
testwikidatawiki
commonswiki
testcommonswiki
thcipriani triaged this task as Medium priority.
thcipriani updated Backup Conductor, added: hashar.

Thank you @Lucas_Werkmeister_WMDE for the risky patch details! This will be very helpful and it is very much appreciated!

Change 630710 had a related patch set uploaded (by DannyS712; owner: trainbranchbot):
[mediawiki/core@wmf/1.36.0-wmf.11] Branch commit for wmf/1.36.0-wmf.11

https://gerrit.wikimedia.org/r/630710

hashar added a subscriber: hashar.Sep 29 2020, 8:58 AM

Change 630710 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.11] Branch commit for wmf/1.36.0-wmf.11

https://gerrit.wikimedia.org/r/630710

Mentioned in SAL (#wikimedia-operations) [2020-09-29T19:29:06Z] <twentyafterfour> Checked out mediawiki 1.36.0-wmf.11 on deploy1001 see T263177

Mentioned in SAL (#wikimedia-operations) [2020-09-29T19:35:28Z] <twentyafterfour@deploy1001> Started scap: testwikis to 1.36.0-wmf.11 refs T263177

Mentioned in SAL (#wikimedia-operations) [2020-09-29T20:45:11Z] <twentyafterfour@deploy1001> Finished scap: testwikis to 1.36.0-wmf.11 refs T263177 (duration: 69m 57s)

@Lucas_Werkmeister_WMDE it seems that after this change, this url[1] indeed becomes uncacheable.

But unfortunately one of the three that should remain cacheable is now uncacheable[2]:

curl -i "https://test.wikidata.org/wiki/Special:EntityData/Q11.ttl?flavor=dump&revision=520357"

HTTP/2 200
date: Tue, 29 Sep 2020 02:35:38 GMT
server: mw2331.codfw.wmnet
x-content-type-options: nosniff
p3p: CP="See https://test.wikidata.org/wiki/Special:CentralAutoLogin/P3P for more info."
access-control-allow-origin: *
x-frame-options: DENY
vary: Accept-Encoding
last-modified: Mon, 28 Oct 2019 15:37:09 GMT
content-type: text/turtle; charset=UTF-8
age: 66846
x-cache: cp1083 hit, cp1085 miss
x-cache-status: hit-local
server-timing: cache;desc="hit-local"

....

cache-control: private, s-maxage=0, max-age=0, must-revalidate

  1. https://test.wikidata.org/wiki/Special:EntityData/Q11.json?revision=520357&nocache=12345678
  2. https://test.wikidata.org/wiki/Special:EntityData/Q11.ttl?flavor=dump&revision=520357

I suppose this means I shall revert the patch :(

Change 630797 had a related patch set uploaded (by 20after4; owner: 20after4):
[mediawiki/extensions/Wikibase@master] Revert "More consistent handling of cached URLs"

https://gerrit.wikimedia.org/r/630797

@Lucas_Werkmeister_WMDE: I'm rolling back because the revert has a merge conflict. I'm not sure what to do about it so I'll leave it to you to resolve.

mmodell added a comment.EditedTue, Sep 29, 9:45 PM

The train is now blocked but there is no task for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/621757 so I can't add anything as a blocker.... :-/

edit: looks like it is actually {T260349} but just wasn't referenced in the gerrit change.

mmodell added a subtask: Restricted Task.Tue, Sep 29, 9:49 PM

Another risky patch is rMW08c27e0e4d8b: Remove unwanted parse step (in theory it is super simple but experience says even super simple changes to the page save / update code can result in surprise). It should push page rendering to happen after page saving, and thus take into account the saved state more completely (eg. when a tracking category on a Commons image page depends on the contents of the associated structured data). The change is probably only relevant for pages with some kind of MCR data (ie. only Commons).

Test plan is checking page save for Flow pages, Commons file pages and pages which incorporate Wikidata data. @Etonkovidova has graciously offered to help. If there are issues, it should be fine to just revert it; maybe purge affected pages.

@mmodell sorry, I wasn’t sure who would verify the risky change or what the next steps would be. I don’t think it’s actually necessary to roll back the train for group0 – if some requests on Test Wikidata aren’t cached anymore, that should be fine, that wiki shouldn’t see a lot of requests anyways. The important thing is that this patch, which is apparently buggy, doesn’t reach real Wikidata in group1. So if you want to test wmf.11 some more, it should be fine to move group0 ahead again, as long as group1 and group2 stay on wmf.10 for now. (I’ll also look into why the patch isn’t broken, or try to resolve the conflicts on revert.)

Wait a second.

age: 66846
x-cache: cp1083 hit, cp1085 miss
x-cache-status: hit-local
server-timing: cache;desc="hit-local"

cache-control: private, s-maxage=0, max-age=0, must-revalidate

Doesn’t that mean that the cache was hit? If I understand correctly, the cache-control: private is set by Varnish, overriding the header sent by MediaWiki – but if Varnish itself had a cache hit, that should mean that MediaWiki correctly marked the response as cacheable. So I think this is all working as intended, actually? (I can’t test it now without the change being deployed, though.)

@Lucas_Werkmeister_WMDE interesting. I was kinda confused about that as well but I didn't want to take any risks. I'll go ahead and roll forward to group0 and we'll see where things stand then.

Tgr added a comment.Wed, Sep 30, 5:37 PM

Yeah that's a cached response. You can check the live one by SSH-ing into a debug host and running something like curl -I -H 'Host: test.wikidata.org' -H 'X-Forwarded-Proto: https' 'http://mwdebug2001.codfw.wmnet/wiki/Special:EntityData/Q11.ttl?flavor=dump&revision=520357'.

Train rolled all the way back and completely undeployed wmf.11 because of the massive number of errors that were occurring even after rolling back group1.

Pchelolo added a subscriber: Pchelolo.EditedThu, Oct 1, 6:05 PM

For the ParserCache blocker, this is the chain of back ports that moves wmf.11 to the same exact state as wmf.10 and master for the parserCache: They are to be merged before deployment.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/631240/2 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/631473/2

mmodell removed a subtask: Restricted Task.Thu, Oct 1, 7:25 PM

Mentioned in SAL (#wikimedia-operations) [2020-10-01T20:08:13Z] <twentyafterfour@deploy1001> Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train T264257 T263177 (duration: 00m 59s)

Change 631527 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/mediawiki-config@master] group1 wikis to 1.36.0-wmf.11 refs T263177

https://gerrit.wikimedia.org/r/631527

Change 631527 merged by jenkins-bot:
[operations/mediawiki-config@master] group1 wikis to 1.36.0-wmf.11 refs T263177

https://gerrit.wikimedia.org/r/631527

Mentioned in SAL (#wikimedia-operations) [2020-10-01T20:47:18Z] <twentyafterfour@deploy1001> rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11 refs T263177

Mentioned in SAL (#wikimedia-operations) [2020-10-01T20:48:25Z] <twentyafterfour@deploy1001> Synchronized php: group1 wikis to 1.36.0-wmf.11 refs T263177 (duration: 01m 06s)

Since all wikis are on wmf.10 currently, does this mean that wmf.12 (T263178) won’t be deployed next week? I’m asking this because next week’s Tech News still lists the wmf.12 train as if nothing have happened, but that should probably be reconsidered.

hashar added a subtask: Restricted Task.Mon, Oct 5, 9:29 AM
hashar updated the task description. (Show Details)
hashar changed the status of subtask T264370: User authentication security issue (Oct 1) from Open to Stalled.Mon, Oct 5, 9:33 AM
hashar added a comment.Mon, Oct 5, 6:35 PM

Since all wikis are on wmf.10 currently, does this mean that wmf.12 (T263178) won’t be deployed next week? I’m asking this because next week’s Tech News still lists the wmf.12 train as if nothing have happened, but that should probably be reconsidered.

We do not know yet. Investigations are going on due to T264370 which might or might not related to the attempted deployment of 1.36.0-wmf.11 last week. So far we do not even know whether it is a problem with that version or with the infrastructure itself.

I guess we will do wmf.11 first but for wmf.12 I am not sure when we will deploy it and can't really make plan until T264370 is fixed :-\

dduvall claimed this task.Tue, Oct 13, 4:15 PM
dduvall added a subscriber: mmodell.

Mentioned in SAL (#wikimedia-operations) [2020-10-13T17:30:37Z] <marxarelli> 1.36.0-wmf.11 promoted to group0. no new errors (T263177). preparing to promote to group1

Mentioned in SAL (#wikimedia-operations) [2020-10-13T18:21:07Z] <marxarelli> 1.36.0-wmf.11 promoted to group1. no new errors (T263177). promoting to all wikis

dduvall closed this task as Resolved.Tue, Oct 13, 7:57 PM
dduvall removed a subtask: Restricted Task.

Re-deployed for all wikis.