Page MenuHomePhabricator

Transcluding Special:Prefixindex can force the default skin
Closed, ResolvedPublicBUG REPORT

Description

A recent discussion at the technical village pump has brought up issues with Vector 2022 showing up on some, apparently somewhat arbitrary, pages.

Steps to reproduce:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'll hazard a guess that this is related to doing special page transclusions. For example, https://en.wikipedia.org/wiki/Wikipedia:Featured_article_candidates/Pasqua_Rosée/archive1 transcludes {{Special:Prefixindex/Wikipedia:Featured article candidates/Pasqua Rosée/}} and {{Special:Prefixindex/Wikipedia:Featured article review/Pasqua Rosée/}} via the https://en.wikipedia.org/wiki/Template:Featured_article_tools template.

I don't know why this has broken now yet, but I remember that special page transclusion has caused similar issues in the past.

yup, you got ithttps://test.wikipedia.org/wiki/User:TheresNoTime/sandbox3 has just

{{Special:Prefixindex/User:TheresNoTime/}}

and the issue is now reliably reproducible

If we can get this reproducible on localhost, might want to git bisect core and vector-2022 repos to figure out where the bug is.

Jdlrobson renamed this task from Vector 2022 force-deploying on arbitrary pages to Using Special:Prefixindex can force the default skin.May 11 2023, 5:52 PM

Since no one could reproduce locally, but it's reliably reproducible at https://en.wikipedia.beta.wmflabs.org/wiki/Prefixindex, I decided to bisect it there on the beta cluster. Reverting core or Vector to wmf.7 did not fix the issue, so I decided to "bisect" the list of extensions – revert half of them to wmf.7, test if that fixes it, revert half back to wmf.8, test again, etc. That was pretty boring, but it was a success – GrowthExperiments was somehow the culprit. Then I bisected with Git normally in that extension, and found that the bug was introduced by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/903733.

The GrowthExperiments commit doesn't look like the root cause, it just reveals the issue by doing $context->getSkin() in the onSpecialPageBeforeExecute hook. This evidently runs when processing a special page transclusion, and ends up caching the skin (which is the wiki default skin at that point) in that context object. I'm not sure how that skin ends up "leaking" into the global context, but it must be somehow related to T290706.

FYIO, I don't know any details, but somebody complained today about an opposite case — opening some pages forces vector legacy when vector 22 is the default skin.

Change 919247 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919247

I reproduced the issue locally by creating a page that included Special:Prefixindex. I applied the following config:

$wgHooks['SpecialPageBeforeExecute'][] = function ( SpecialPage $special, $subPage ) {
	RequestContext::getMain()->getSkin();
};
$wgParserCacheType = CACHE_NONE;

It's necessary for the skin to be uninitialised at the time of the hook call, and I had to disable FlaggedRevs to achieve that. With this test setup, the page was delivered with the default skin instead of the user preference skin.

I then tested the fix linked above.

Nardog renamed this task from Using Special:Prefixindex can force the default skin to Transcluding Special:Prefixindex can force the default skin.May 12 2023, 12:46 AM

Change 919247 merged by jenkins-bot:

[mediawiki/core@master] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919247

Change 919178 had a related patch set uploaded (by Hashar; author: Tim Starling):

[mediawiki/core@wmf/1.41.0-wmf.8] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919178

Change 919178 merged by jenkins-bot:

[mediawiki/core@wmf/1.41.0-wmf.8] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919178

Mentioned in SAL (#wikimedia-operations) [2023-05-12T08:52:30Z] <hashar@deploy1002> Started scap: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-12T08:54:00Z] <hashar@deploy1002> hashar: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet

hashar assigned this task to tstarling.
hashar subscribed.

Very well done. Will run the train and promote all wikis to 1.41.0-wmf.8

Mentioned in SAL (#wikimedia-operations) [2023-05-12T09:08:58Z] <hashar@deploy1002> Finished scap: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]] (duration: 16m 27s)

Apparently the issue has resurfaced.

Jdlrobson lowered the priority of this task from Unbreak Now! to Needs Triage.Feb 23 2024, 3:41 AM
Jdlrobson added a subscriber: RoySmith.

Also, like last time, a null edit to the page is at least occasionally clearing the problem.

According to this report, the problem also exists on [[Wikipedia:WikiProject Canoeing and Kayaking]]. I was able to reproduce it, but only once in perhaps a dozen tries. As noted by @Xaosflux in that thread, the Special:Prefixindex transclusion happens at the bottom of the page in a navbox.

In most of the examples above which are reproducable at a greater frequency, the transclusion is right at the top, to generate the back-link header. This leads me to suspect a race condition in how the javascript loads. That might also explain why a hard reload gets you back to the non-V22 skin; flushing the cache will change the javascript timing.

Another interesting data point from the VPT thread. These three all render in V2022:

But these two render in my configured skin:

They all include Special:Prefix in the rendered HTML, but in various contexts.

The initalize of the OOUIHTMLForm triggers this, which also happen in the include modus of Special:PrefixIndex, b08b6a7785e5ff3f152fdaf8f7b7cc13023212b8 was a first try to fix this, but got reverted. Cross-linking T352592

Reports of this keep showing up, so it appears to be a widespread problem. Would it be possible to roll back Thursday's deployment until a permanent fix for this can be developed?

@RoySmith - the Thursday issue we're trying to fix is not related to this issue. We encountered a DB replication issues that affect only Beta Cluster wikis - not production one.

But there could be something else in the last train that caused this issue to be more prominent.

brennen triaged this task as Unbreak Now! priority.Feb 26 2024, 4:43 PM
brennen added a subscriber: dduvall.

Reports of this keep showing up, so it appears to be a widespread problem. Would it be possible to roll back Thursday's deployment until a permanent fix for this can be developed?

As a first step, let's consider this a blocker for this week's T354438: 1.42.0-wmf.20 deployment blockers. It is possible to roll back the train on a Monday, but it's usually a bad time.

cc: @dduvall

T299308: Add Hide translations to Special:PrefixIndex went out with wmf.19 so it could be related as it shuffled code around a bit. Looking at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/961226 the HTMLForm::factory gets called before check of transclusion now.

I can reproduce this. The special page calls OutputPage::setupOOUI() which calls RequestContext::getSkinName(). Resetting the skin name would help, although it would leave the wrong OOUI theme in place.

Change 1006601 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] In RequestContext::setUser() also reset $this->skinName

https://gerrit.wikimedia.org/r/1006601

Change 1006602 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] SpecialPageFactory::capturePath: Stop saving and restoring global state

https://gerrit.wikimedia.org/r/1006602

Discussion of the second patch will be on T290706. The first patch should be enough to resolve the UBN.

Change 1006601 merged by jenkins-bot:

[mediawiki/core@master] In RequestContext::setUser() also reset $this->skinName

https://gerrit.wikimedia.org/r/1006601

Change 1006313 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@wmf/1.42.0-wmf.19] In RequestContext::setUser() also reset $this->skinName

https://gerrit.wikimedia.org/r/1006313

Change 1006313 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.19] In RequestContext::setUser() also reset $this->skinName

https://gerrit.wikimedia.org/r/1006313

Mentioned in SAL (#wikimedia-operations) [2024-02-27T09:53:06Z] <jnuche@deploy2002> Started scap: Backport for [[gerrit:1006313|In RequestContext::setUser() also reset $this->skinName (T336504)]]

Mentioned in SAL (#wikimedia-operations) [2024-02-27T09:54:46Z] <jnuche@deploy2002> jnuche and tstarling: Backport for [[gerrit:1006313|In RequestContext::setUser() also reset $this->skinName (T336504)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-02-27T10:03:19Z] <jnuche@deploy2002> Finished scap: Backport for [[gerrit:1006313|In RequestContext::setUser() also reset $this->skinName (T336504)]] (duration: 10m 12s)

Change #1006602 abandoned by Tim Starling:

[mediawiki/core@master] SpecialPageFactory::capturePath: Stop saving and restoring global state

Reason:

https://gerrit.wikimedia.org/r/1006602