Page MenuHomePhabricator

Transcluding Special:Prefixindex can force the default skin
Closed, ResolvedPublicBUG REPORT

Description

A recent discussion at the technical village pump has brought up issues with Vector 2022 showing up on some, apparently somewhat arbitrary, pages.

Steps to reproduce:

Event Timeline

Sorry, for further clarification, this is occurring to me when using the Vector 2010 skin, and also has been reported by Kusma on Monobook.

It also appears to only be on subpages so far, though whether that pattern is coincidence or not is unclear.

Izno changed the subtype of this task from "Task" to "Bug Report".Thu, May 11, 3:33 PM
taavi triaged this task as Unbreak Now! priority.Thu, May 11, 3:34 PM
taavi added a subscriber: taavi.

(assuming this is caused by today's train, so marking as a train blocker)

Reloading the page several times seems to help, perhaps a larger issue of something like ''logged in users are being given cached pages''?

A purge and then a refresh on pages in V22 seems to fix things. I'm not sure just a refresh would, but I ran out of articles to test this with.

Well, except for one, that unfortunately I have to link here; an SPI page that seems to be acting different from all the rest, at least for me. This page seems to be stuck more in V22 than any others; other pages with the V22 bug are good for a few hours at least after a purge and a refresh, but this one temporarily makes it appear normally after a purge and a refresh, before reverting back to V22 when you switch off the page.

(If this is somewhat incomprehensible, I wrote this when delirious with tiredness pretty much. I can reword later, if things aren't fixed by then.

Train notes after some slack discussion:

  • Confirming reproduction at the linked pages.
  • Via mw.config.get('wgHostname'), served from mw2268, mw2272, mw2384 - all codfw on ones I've seen so far.
  • Will roll 1.41.0-wmf.8 back to group1; if it goes away, presumably it's a code change in this train. If not, should look elsewhere.

Mentioned in SAL (#wikimedia-operations) [2023-05-11T16:37:51Z] <brennen> train 1.41.0-wmf.8 (T330214): rolling back to group1 to test for T336504 presence/absence on enwiki

I would aslo add - purging pages shouldn't help, unless we broke something fundamental in how parsercache works.

Our previous assumption that this was only happening (as far as our ability to reproduce the bug) in codfw just proved wrong, I randomly got a page on V22 while having set V10 in my preferences for one of the reported pages.

This could very well be a red herring, but just in case its not — repeatedly trying to replicate the issue on https://test.wikipedia.org/wiki/User:TheresNoTime/sandbox2 does not work (remains in V10), copying the content of https://en.wikipedia.org/wiki/User:JPxG/sandbox to https://test.wikipedia.org/wiki/User:TheresNoTime/sandbox causes an immediate reproduction of the issue (V10 on first load, V11 after an action=purge, back to V10 on page refresh).

Ok, so I got a couple of massive transient error spikes on rolling back and the deploy process seems to take at least 3x as long as I'd expect, but group2 is now at wmf.7. Can anyone get a reproduction case for enwiki now? So far I am unable to do so.

Ok, so I got a couple of massive transient error spikes on rolling back and the deploy process seems to take at least 3x as long as I'd expect, but group2 is now at wmf.7. Can anyone get a reproduction case for enwiki now? So far I am unable to do so.

Can't repro on https://en.wikipedia.org/wiki/User:JPxG/sandbox any more, and prior to the rollback it was fairly consistent.

I'll hazard a guess that this is related to doing special page transclusions. For example, https://en.wikipedia.org/wiki/Wikipedia:Featured_article_candidates/Pasqua_Rosée/archive1 transcludes {{Special:Prefixindex/Wikipedia:Featured article candidates/Pasqua Rosée/}} and {{Special:Prefixindex/Wikipedia:Featured article review/Pasqua Rosée/}} via the https://en.wikipedia.org/wiki/Template:Featured_article_tools template.

I don't know why this has broken now yet, but I remember that special page transclusion has caused similar issues in the past.

I'll hazard a guess that this is related to doing special page transclusions. For example, https://en.wikipedia.org/wiki/Wikipedia:Featured_article_candidates/Pasqua_Rosée/archive1 transcludes {{Special:Prefixindex/Wikipedia:Featured article candidates/Pasqua Rosée/}} and {{Special:Prefixindex/Wikipedia:Featured article review/Pasqua Rosée/}} via the https://en.wikipedia.org/wiki/Template:Featured_article_tools template.

I don't know why this has broken now yet, but I remember that special page transclusion has caused similar issues in the past.

yup, you got ithttps://test.wikipedia.org/wiki/User:TheresNoTime/sandbox3 has just

{{Special:Prefixindex/User:TheresNoTime/}}

and the issue is now reliably reproducible

If we can get this reproducible on localhost, might want to git bisect core and vector-2022 repos to figure out where the bug is.

Jdlrobson renamed this task from Vector 2022 force-deploying on arbitrary pages to Using Special:Prefixindex can force the default skin.Thu, May 11, 5:52 PM

Since no one could reproduce locally, but it's reliably reproducible at https://en.wikipedia.beta.wmflabs.org/wiki/Prefixindex, I decided to bisect it there on the beta cluster. Reverting core or Vector to wmf.7 did not fix the issue, so I decided to "bisect" the list of extensions – revert half of them to wmf.7, test if that fixes it, revert half back to wmf.8, test again, etc. That was pretty boring, but it was a success – GrowthExperiments was somehow the culprit. Then I bisected with Git normally in that extension, and found that the bug was introduced by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/903733.

The GrowthExperiments commit doesn't look like the root cause, it just reveals the issue by doing $context->getSkin() in the onSpecialPageBeforeExecute hook. This evidently runs when processing a special page transclusion, and ends up caching the skin (which is the wiki default skin at that point) in that context object. I'm not sure how that skin ends up "leaking" into the global context, but it must be somehow related to T290706.

FYIO, I don't know any details, but somebody complained today about an opposite case — opening some pages forces vector legacy when vector 22 is the default skin.

Change 919247 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919247

I reproduced the issue locally by creating a page that included Special:Prefixindex. I applied the following config:

$wgHooks['SpecialPageBeforeExecute'][] = function ( SpecialPage $special, $subPage ) {
	RequestContext::getMain()->getSkin();
};
$wgParserCacheType = CACHE_NONE;

It's necessary for the skin to be uninitialised at the time of the hook call, and I had to disable FlaggedRevs to achieve that. With this test setup, the page was delivered with the default skin instead of the user preference skin.

I then tested the fix linked above.

Nardog renamed this task from Using Special:Prefixindex can force the default skin to Transcluding Special:Prefixindex can force the default skin.Fri, May 12, 12:46 AM

Change 919247 merged by jenkins-bot:

[mediawiki/core@master] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919247

Change 919178 had a related patch set uploaded (by Hashar; author: Tim Starling):

[mediawiki/core@wmf/1.41.0-wmf.8] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919178

Change 919178 merged by jenkins-bot:

[mediawiki/core@wmf/1.41.0-wmf.8] Reset the cached skin in RequestContext::setUser()

https://gerrit.wikimedia.org/r/919178

Mentioned in SAL (#wikimedia-operations) [2023-05-12T08:52:30Z] <hashar@deploy1002> Started scap: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-12T08:54:00Z] <hashar@deploy1002> hashar: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet

hashar assigned this task to tstarling.
hashar added a subscriber: hashar.

Very well done. Will run the train and promote all wikis to 1.41.0-wmf.8

Mentioned in SAL (#wikimedia-operations) [2023-05-12T09:08:58Z] <hashar@deploy1002> Finished scap: Backport for [[gerrit:919178|Reset the cached skin in RequestContext::setUser() (T336504)]] (duration: 16m 27s)