Page MenuHomePhabricator

Skin continously switch to Vector 2022 despite (global) preferences
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

image.png (508×1 px, 88 KB)

What happens?:
My skin continously switch to Vector 2022 despite (global) preferences. I tried turning on safemode, but that doesn't work. Also, mw.config.get('skin'); return 'vector', as seen in this screenshot. I'm not sure how to reproduce.

What should have happened instead?:
Everything should stay as is.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 769903 had a related patch set uploaded (by Func; author: Func):

[mediawiki/skins/Vector@master] Hook: Don't set user option when handling the GetOptions hook

https://gerrit.wikimedia.org/r/769903

Change 755813 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/skins/Vector@master] End migration mode

https://gerrit.wikimedia.org/r/755813

Hi CommTech
How do we decide default values for global preferences?

On English Wikipedia the default skin is Vector, but on French Wikipedia the default skin is Vector 2022. I believe GlobalPreferences is dropping rows which match the default leading to some inconsistencies across wikis.

IS global preferences dropping database rows where the default for that wiki is the same?

Change 755813 merged by jenkins-bot:

[mediawiki/skins/Vector@master] End migration mode

https://gerrit.wikimedia.org/r/755813

Change 769903 abandoned by Func:

[mediawiki/skins/Vector@master] Hook: Don't set user option when handling the GetOptions hook

Reason:

https://gerrit.wikimedia.org/r/769903

Hi CommTech
How do we decide default values for global preferences?

On English Wikipedia the default skin is Vector, but on French Wikipedia the default skin is Vector 2022. I believe GlobalPreferences is dropping rows which match the default leading to some inconsistencies across wikis.

IS global preferences dropping database rows where the default for that wiki is the same?

Acknowledging this question. I haven't had a chance to investigate, but I'll note GlobalPreferences has undergone significant changes over the past few months. Pinging @Func as they did the bulk of the work. It sounds like another bug, not too dissimilar from the bugs we have been attempting to fix. Perhaps this is what https://gerrit.wikimedia.org/r/753952/ is for?

I can't reproduce this, what option in the form is on after your submission?

IS global preferences dropping database rows where the default for that wiki is the same?

No, it wouldn't. And I believe many preferences have different default values across wikis, but no similar reports till now.

I think we need more information about T285216, which mentioned sometimes it failed to find a global ID for the user, which means it failed to load global preferences to override local ones or sites' defaults.

This looks like it's working again for me. @NguoiDungKhongDinhDanh are you still seeing this issue?

I ran into this bug today. I went to Global Preferences and tried to set Vector Legacy as my global default skin. Initially, the skins were ordered as follows:

[√] Vector (2022) (default | Preview | Custom CSS | Custom JavaScript)
Vector legacy (2010) (Preview | Custom CSS | Custom JavaScript)
MinervaNeue (Preview | Custom CSS | Custom JavaScript)
MonoBook (Preview | Custom CSS | Custom JavaScript)
Timeless (Preview | Custom CSS | Custom JavaScript)

Then I selected "Vector legacy" and clicked Save, and it reloaded the page still in Vector 2022. The menu now looked like this, with Vector 2022 still selected but the top two rows switched:

Vector legacy (2010) (Preview | Custom CSS | Custom JavaScript)
[√] Vector (2022) (default | Preview | Custom CSS | Custom JavaScript)
MinervaNeue (Preview | Custom CSS | Custom JavaScript)
MonoBook (Preview | Custom CSS | Custom JavaScript)
Timeless (Preview | Custom CSS | Custom JavaScript)

Then I selected "Vector legacy" and clicked Save, and it reloaded the page still in Vector 2022. The menu now looked like this, with Vector 2022 still selected but the top two rows switched:

Good observation! Current skin would be popped to top, but somehow the selection is overridden and the appearance too. What did your local preference page look like?

Ok, I see. The change 755813 didn't catch up or backport to 1.39.0-wmf.4.
@Jdlrobson You want to backport them or wait for the next train?

@Jdlrobson Not as common as it was a month ago, but it still happens.

Looking at just one of the pilot wikis (French Wikipedia) and its logged-in user base, this problem seems to be happening over a dozen times a minute, every minute, or about 400 times an hour.

https://grafana.wikimedia.org/d/OHBhRhy7k/vector-bug-t302627?orgId=1

Screenshot 2022-03-29 at 17.40.51.png (838×1 px, 132 KB)

I'm not sure whether this is related to GlobalPreferences. If it is, it would be an issue different than the one hinted at here as there are afaik no missing or altered database rows ocurring during the page views where this bug starts, stops, re-appears, or anywhere in-between.

With WikimediaDebug active and verbosely logging, I can reproduce this within a minute by navigating to a number of different page from Special:RecentChanges.

In addition to no relevant database activity, I also note that mw.config.get('skin') set to vector, indicating that the server at one point knew it was meant not meant to serve Vector 22 (per change 759531). And mw.user.options.get('skin') is also vector, indicating that whatever involvement GlobalPreference may have that once all is said and done the value reflected is vector and not vector-2022. Yet, the skin template proceeds as Vector 22.

The following is two consecutive page views. Again, no database updates or inserts queries observed during this browsing session. The behaviour appears random.

Screenshot 2022-03-29 at 01.41.57.png (1×2 px, 569 KB)
Screenshot 2022-03-29 at 01.41.15.png (1×2 px, 589 KB)

Even when purging (with the action=purge form itself so far showing in the correct skin), the very response to the purge form submission is often rendering in the wrong skin. The response metadata and latency confirm that it was freshly parsed, which more-or-less rules out race conditions or cache bugs from other concurrent page views.

Screenshot 2022-03-29 at 02.25.18.png (675×972 px, 153 KB)

@Krinkle Your test is done before or after the wmf.5 branch roll out?

Change 774839 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/skins/Vector@wmf/1.39.0-wmf.4] End migration mode

https://gerrit.wikimedia.org/r/774839

@Jdlrobson What about the 1.38 release branch, I think my simple patch should be enough, or you want to backport that big change?

There was never official 3rd party support for skin versioned Vector. We can backport a change to skin.json to set VectorSkinMigrationMode and VectorShowSkinPreferences to false as a cautionary measure if we're concerned.

Indicating that whatever involvement GlobalPreference may have that once all is said and done the value reflected is vector and not vector-2022. Yet, the skin template proceeds as Vector 22.

@Krinkle I believe what's happening is you have the hidden preference VectorSkinVersion set to 1 in either global preferences or local preferences.
The issue should be rectified by running:
new mw.Api().saveOptions( { 'VectorSkinVersion': 2 } )
Can you confirm?

Per @Func I presume this test was done before the wmf.5 branch roll . If it was done after, then we have a different problem than the one I have been trying to fix so please let me know urgently.

This didn't backport cleanly to wmf5 due to some canary issues. I'm talking to @Catrope about trying again.

There was never official 3rd party support for skin versioned Vector. We can backport a change to skin.json to set VectorSkinMigrationMode and VectorShowSkinPreferences to false as a cautionary measure if we're concerned.

I think the Miraheze community cares about this, ccing @RhinosF1 for attention.

@Jdlrobson: please make sure the release is in an appropriate state for third parties.

Thanks for the link. That's going to be fun with our setup.

wmf5 is everywhere now. @Krinkle how are you generating that graph? I couldn't find anything indication of how in codesearch. Assuming it's still relevant, it looks like there are still 2-6 failed expectations a minute.

While looking at T305232 I realized that the remaining SkinVersionLookup code is very problematic and is the problem we are now seeing.

I'm pretty sure GlobalPreferences is working as expected but we can confirm when T305232 is fixed and deployed.

The linked Grafana dashboard has a legend with relevant information and links.

Okay gadget, thanks for clarifying, then yes, I believe what you are logging is consistent with the behaviour I am seeing in T305232 .

LGoto set the point value for this task to 3.Apr 4 2022, 5:46 PM

Change 776226 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/skins/Vector@master] Drop the LatestSkinVersionRequirement

https://gerrit.wikimedia.org/r/776226

Change 776226 merged by jenkins-bot:

[mediawiki/skins/Vector@master] Drop the LatestSkinVersionRequirement

https://gerrit.wikimedia.org/r/776226

I am also encountering this issue. Nowadays it seems less frequent, but it may still happen sometimes. (edit: a fix has just been deployed, it simply hasn't reached frwiki yet)

It seems to happen much more frequently on specific pages. For example, on the French wiki Projet:Modèle and Discussion Projet:Modèle.

By the way, at the time of writing, on these specific pages I am on legacy Vector as expected, but there are display bugs:

Clipboard01.png (44×561 px, 1 KB)

(the Preferences and Contributions links are messed up, and the Logout link is missing (even from the HTML). Note the Sandbox (Brouillon in French) and Beta links are simply hidden because I'm hiding them using personal CSS)

Also, I usually fix this issue by doing a force refresh (shift + F5). A regular refresh (F5) isn't sufficient.

@Od1n can you help us QA this by seeing if you encounter the issue on mediawiki.org?
I'm hoping to backport the change to French Wikipedia and other wikis once I can confirm that (otherwise it will be fixed on Thursday earliest).

Indeed, I was also encountering this issue on mediawiki.org. I don't remember on what pages exactly, but I'm pretty sure it was occurring on ResourceLoader/Core modules.

Currently on mediawiki.org, my settings had been changed back to Vector 2022 (I don't remember if I did the change, or if it happened without intervention), so I switched to Vector classic again, and I'm not seeing the issue so far.

MarkAHershberger subscribed.

bugs with global preferences shouldn't block 1.38 release since global prefs are not an essential feature for most 3rd party wikis.

Looking at just one of the pilot wikis (French Wikipedia) and its logged-in user base, this problem seems to be happening over a dozen times a minute, every minute, or about 400 times an hour.

https://grafana.wikimedia.org/d/OHBhRhy7k/vector-bug-t302627?orgId=1

Following up here, I was a bit confused why this code was logging violations long after it was possible for this to occur. A closer look suggests that there was a bug in the instrumentation, in that this code was running for non-Vector skins e.g. Timeless and Monobook (I was able to see events when in Monobook in my network tab)

I'm waiting on caches to catch up before verifying.

Is that a bug in the instrumentation, or does it indicate that new Vector is also showing up unexpectedly when Timeless or MonoBook should be shown?

The other skins counted as vector19 for the purposes of this instrumentation. E.g. as expected or unexpected vector19. I could have named it expected/unexpected_other. I don't see a way in which it logically could increment unexpected_vector22 incorrectly since it's explicitly testing for that expectation and layout.

I've reverted the edit and instead removed the "other"/vector19 counters. Also, I synced it to the same fr-language wikis as before, so that the update takes affect to more than mw.org. I won't make a difference to the graph however as I only plotted this metric, not the others.

I found the following issues with the instrumentation code so far:

Scenario 1:

Expected: Nothing logged.
Actual: Logs unexpected_vector22

Scenario 2

Expected: Nothing logged

Perhaps we can drop the query string check? That doesn't seem helpful right now

Expected: Nothing logged.
Actual: Logs unexpected_vector22

vector-22 isn't a valid parameter value. This counts the same as useskin=helloworld. I don't think that's where the metrics are coming from, but sure, let's add validation. Done.

  • Set user skin to "monobook" on French Wikipedia
  • Expected: Nothing logged

And it doesn't.

Thanks for the updates to the scripts @Krinkle!

I've been looking closely at referers [1] to the beacon endpoint, and I'm not seeing any unusual patterns in where they are coming from. I am noticing all of the errors are coming from fr.wikipedia.org at this point. Nothing seems to be coming from frwikiquote or mediawiki.org. The majority are coming from Windows platform.

I am likely going to need further information to debug this. Just not sure exactly what yet... at this point it's more than likely due to some issue inside core or GlobalPreferences.

[1] select uri_host,namespace_id, referer from wmf.webrequest where day = 11 and month = 4 and year = 2022 and uri_path LIKE '%beacon%' and uri_query LIKE "%vector_bug_T302627.unexpected_vector22%"

I am likely going to need further information to debug this. Just not sure exactly what yet... at this point it's more than likely due to some issue inside core or GlobalPreferences.

I find that unlikely given we've never seen issues like this before with any skin.

One thing that might help would be to walk through the numerous unsafe service classes that Vector registers, which are using the service container to hold global singletons that vary on state other than site configuration (e.g. web request, user). This is not what service wiring is for, and means if any of these services are invoked at the "wrong" time or in the "wrong" context (e.g. non-pageview or job), they produce and subsequently retain incorrect results that may be very difficult to trace. Some details at T218555 and in the DI docs (docs/Injection.md, ServiceWiring.php, MediaWikiServices.php; all mention similar principles and what it is/not for).

We could of course try to catch every possible caller and apply a local workaround or careful inline condition (like with the wgUser-isSafe previously), but with the number of different situations in which this issue has been happening (T288113, T300278, T303265, T305232, T305262) a more holistic look at the problem may be appropiate.

It was a good hunch that removing the complexity of the mgiration mode might luckily remove the only instances of such untimely calls and turn the inevitable poisoning of the service wiring back into only a problem in theory and principle. But alas, it appears there remain about as many as there were before.

See also T242835#6064656, which observes FeatureManager as one such service in violation of this principle, and T288113#7345667 which flags it as a possible suspect.

At the request of several team members, who were getting confused by this thread, I've opened a new ticket capturing what we've learned (T305966) and continuing the investigation.