Page MenuHomePhabricator

Vector sidebar missing on test.wikipedia.org and weird footer
Closed, ResolvedPublic

Description

image.png (1×1 px, 185 KB)

I also saw this issue on translatewiki.net, but it got magically resolved somehow while we debugged it.

Event Timeline

Jdforrester-WMF triaged this task as Unbreak Now! priority.Mar 18 2020, 6:29 PM

The sidebar (apart from the logo) seems to be literally missing from the page html (unless it was deleted by JavaScript...).

Change 581053 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/Vector@master] Change master template to force cache invalidation of partials

https://gerrit.wikimedia.org/r/581053

Also forgot to mention, there was nothing in translatewiki.net error logs after last deployment, when this issue started, other than T248003: Wikimedia\\Rdbms\\DBTransactionError(code: 0): A database transaction round is pending. that would have given a hint what could cause this. It just went away by itself after deploying couple of different versions of Vector (which did not help), doing var_dump style debugging and restarting php-fpm (not sure which of those or any made the issue go away).

Cannot reproduce locally or on https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page
My best guess is this could relate to T113095 as previously template partials were not invalidating. Can I confirm the patches in T113095 are present on test.wikipedia ?

My best guess is there is still some kind of problem relating to partials (either the fix didn't work or there are some problems with compatibility with existing cache).
If that's the case my change https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/Vector/+/579025/ touched the sidebar but not the master template and would produce this kind of result.

I suspect https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/Vector/+/581053 Change master template to force cache invalidation of partials will therefore fix it. Can somebody confirm?

Change 581054 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/Vector@wmf/1.35.0-wmf.24] Change master template to force cache invalidation of partials

https://gerrit.wikimedia.org/r/581054

My best guess is this could relate to T113095 as previously template partials were not invalidating. Can I confirm the patches in T113095 are present on test.wikipedia ?

This wouldn't be a bad guess.

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/575229/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/575230/ are both present on the wmf.23 branch. Although the wmf.23 branch wasn't deployed to all wikis until today (see T233871#5979805), I'd still expect to see the issue everywhere if the bug was isolated to those two changes.

If that's the case my change https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/Vector/+/579025/ touched the sidebar but not the master template and would produce this kind of result.

Possibly. I'll investigate.

Cannot reproduce locally or on https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page
My best guess is this could relate to T113095 as previously template partials were not invalidating. Can I confirm the patches in T113095 are present on test.wikipedia ?

Assuming all those patches refered to T113095 and given test.wikipedia.org is running 1.35.0-wmf.24, from mediawiki/core I have:

$ git log --oneline --grep=T113095 origin/wmf/1.35.0-wmf.24 
abf2463056 Rename TemplateParserTest.php -> TemplateParserIntegrationTest.php
419bbcd8f9 TemplateParser: Test server-local cache interaction
52f50cd657 TemplateParser: Invalidate cache if partial changes
5607158ba7 TemplateParser: Make cache value include metadata
56abd837ba TemplateParser: Document and normalize exceptions

Which seems to match the related gerrit patches listed by Phabricator on the task. So I don't think we have left behind any patch when I have cut the branch on Tuesday.

My best guess is there is still some kind of problem relating to partials (either the fix didn't work or there are some problems with compatibility with existing cache).
If that's the case my change https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/Vector/+/579025/ touched the sidebar but not the master template and would produce this kind of result.

I guess we should have Vector to invalidate the cache in such a case. Not sure though why it would cause the sidebar to entirely disappear? Intuitively I would maybe have expected an older version of the sidebar.

I suspect https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/Vector/+/581053 Change master template to force cache invalidation of partials will therefore fix it. Can somebody confirm?

That can probably be easily tested by cherry picking that fix to mwdebug1001 and confirm it fixes it? Then I guess we can get it merged / cherry picked and deployed.

I guess we should have Vector to invalidate the cache in such a case. Not sure though why it would cause the sidebar to entirely disappear? Intuitively I would maybe have expected an older version of the sidebar.

The data passed to the template changed. So the older template would be referencing data that no longer exists leaving an empty output.

Change 581054 merged by jenkins-bot:
[mediawiki/skins/Vector@wmf/1.35.0-wmf.24] Change master template to force cache invalidation of partials

https://gerrit.wikimedia.org/r/581054

I guess we should have Vector to invalidate the cache in such a case. Not sure though why it would cause the sidebar to entirely disappear? Intuitively I would maybe have expected an older version of the sidebar.

The data passed to the template changed. So the older template would be referencing data that no longer exists leaving an empty output.

I would assume that emits some sort of warning for undefined template variable, right? I'm not seeing that in the logs for testwiki, but I'm also not sure what that would look like for Mustache.

In any event, something to make sure we have an error for in the future so that future template problems that make significant parts of the page disappear can be caught by automated Logstash alerts instead of relying on user reports.

I would assume that emits some sort of warning for undefined template variable, right? I'm not seeing that in the logs for testwiki, but I'm also not sure what that would look like for Mustache.

No. Template variables do not need to be defined.

I can simulate the end result if you need to debug by doing the following:

#core
git checkout 4d4d4b61abc891c2d0de6a91b809444f93144759 

#Vector
git checkout af4cbc2f57cfe7353998827a8776617e5c1c205e

# Visit Vector in browser.

#Vector
git checkout origin/master

# Visit Vector in browser

In any event, something to make sure we have an error for in the future so that future template problems that make significant parts of the page disappear can be caught by automated Logstash alerts instead of relying on user reports.

Many templates rely on values that can be null so I'm not quite sure how that would work in practice or if it's even possible in lightncandy.

Mentioned in SAL (#wikimedia-operations) [2020-03-18T21:56:04Z] <Krinkle> krinkle@mw1385: scap pull # clean up AdHoc debugging for T248010

Change 581107 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/core@master] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581107

Change 581053 abandoned by Jdlrobson:
Change master template to force cache invalidation of partials

Reason:
https://gerrit.wikimedia.org/r/581107

https://gerrit.wikimedia.org/r/581053

Change 581114 had a related patch set uploaded (by Krinkle; owner: Phuedx):
[mediawiki/core@wmf/1.35.0-wmf.23] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581114

Change 581115 had a related patch set uploaded (by Krinkle; owner: Phuedx):
[mediawiki/core@wmf/1.35.0-wmf.24] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581115

Change 581107 merged by jenkins-bot:
[mediawiki/core@master] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581107

Change 581114 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.23] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581114

Change 581115 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.24] TemplateParser: Include template dir in cache key

https://gerrit.wikimedia.org/r/581115

Mentioned in SAL (#wikimedia-operations) [2020-03-18T23:31:11Z] <twentyafterfour@deploy1001> Synchronized php-1.35.0-wmf.23/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581114/ refs T248010 (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-19T00:31:12Z] <twentyafterfour@deploy1001> Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: deploy https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581116 which reverts https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581054 refs T248010 (duration: 01m 07s)

Looks like the weird footer and table styles comment was not noticed, but it seems to be handled as part of T247566: Broken section edit links styles on Vector

And I just noticed that due to that oversight, we now have brokens styles all over mediawiki.org and other group 1 wikies :(

Thanks all for taking care of this and getting it SWATTed. Special thanks to @Nikerabbit for raising the alarm.
(And thanks for the reminder that we need UI regression testing! T107588)