Page MenuHomePhabricator

Main pages of several Beta Cluster wikis redirect to other production wikis (MessageCache keyspace is same for all wikis causing conflicts)
Closed, ResolvedPublic

Description

urbanecm@notebook  ~
$ curl -I https://en.wikipedia.beta.wmflabs.org/
HTTP/2 301
date: Fri, 06 Mar 2020 13:09:38 GMT
server: deployment-mediawiki-07.deployment-prep.eqiad.wmflabs
x-powered-by: PHP/7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1
x-content-type-options: nosniff
p3p: CP="See https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAutoLogin/P3P for more info."
vary: Accept-Encoding,X-Forwarded-Proto,Cookie,Authorization
expires: Fri, 06 Mar 2020 13:09:38 GMT
cache-control: private, must-revalidate, max-age=0
last-modified: Fri, 06 Mar 2020 13:09:38 GMT
location: https://www.wikidata.org/wiki/Main_Page
backend-timing: D=57375 t=1583500178875805
content-type: text/html; charset=utf-8
x-ats-timestamp: 1583500178
x-varnish: 47708406
age: 0
x-cache: deployment-cache-text05 miss, deployment-cache-text05 pass
x-cache-status: pass
server-timing: cache;desc="pass"
set-cookie: WMF-Last-Access=06-Mar-2020;Path=/;HttpOnly;secure;Expires=Tue, 07 Apr 2020 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=06-Mar-2020;Path=/;Domain=.wikipedia.beta.wmflabs.org;HttpOnly;secure;Expires=Tue, 07 Apr 2020 12:00:00 GMT
x-client-ip: XXXXX
set-cookie: GeoIP=XXXXXX:15.80:v4; Path=/; secure; Domain=.beta.wmflabs.org

urbanecm@notebook  ~
$

Event Timeline

Urbanecm triaged this task as Unbreak Now! priority.Mar 6 2020, 1:12 PM

The whole Beta cluster? Well, not entirely… one small indomitable page still holds out—

Jokes aside – our Data Bridge test page still works. I think it’s just the English Beta Wikipedia main page that redirects to Wikidata for some reason. (Note that the “main page” link in the sidebar for the data bridge test page also points to Wikidata.)

The whole Beta cluster? Well, not entirely… one small indomitable page still holds out—

Jokes aside – our Data Bridge test page still works. I think it’s just the English Beta Wikipedia main page that redirects to Wikidata for some reason. (Note that the “main page” link in the sidebar for the data bridge test page also points to Wikidata.)

Not only enwp, http://commons.wikimedia.beta.wmflabs.org/ is also broken.

en.wikisource.beta.wmflabs.org fails too. Pywikibot tests are all failing now due to this issue

Curious – the other wiki I tested was https://de.wikipedia.beta.wmflabs.org/, and that one’s fine.

Random guess: English version of “mainpage” message points to Wikidata for some reason, that’s why the German wiki is unaffected?

Curious – the other wiki I tested was https://de.wikipedia.beta.wmflabs.org/, and that one’s fine.

Random guess: English version of “mainpage” message points to Wikidata for some reason, that’s why the German wiki is unaffected?

It's interesting that shell.php and Title::newMainPage() knows where the main page is.

urbanecm@deployment-deploy01:~$ mwscript shell.php enwiki
Psy Shell v0.9.12 (PHP 7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1 — cli) by Justin Hileman
>>> Title::newMainPage()->getFullText()
=> "Main Page"
>>>

Loading https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page directly works.

Random guess: English version of “mainpage” message points to Wikidata for some reason, that’s why the German wiki is unaffected?

But https://wikidata.beta.wmflabs.org/ is also fine, and Beta Wikidata is also in English, so that’s a point against my theory.

Weeeeeird

So it looks like it has the correct page name („Hauptseite“ is German for “main page”), but the wrong domain?

Per https://wikidata.beta.wmflabs.org/wiki/Special:SiteMatrix:

"en" and "simple" content wikis + commons, deployment, meta and test: https://www.wikidata.org/wiki/Main_Page
"de" wiktionary: https://en.wikipedia.org/wiki/Hauptseite

nothing else fails currently

It's interesting that shell.php and Title::newMainPage() knows where the main page is.

urbanecm@deployment-deploy01:~$ mwscript shell.php enwiki
Psy Shell v0.9.12 (PHP 7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1 — cli) by Justin Hileman
>>> Title::newMainPage()->getFullText()
=> "Main Page"
>>>

It also has the right URLs everywhere, as far as I can tell:

>>> $title = Title::newMainPage()
=> Title {#2653
     +mTextform: "Main Page",
     +mUrlform: "Main_Page",
     +mDbkeyform: "Main_Page",
     +mNamespace: 0,
     +mInterwiki: "",
     +mFragment: "",
     +mArticleID: -1,
     +mRestrictions: [],
     +mCascadeRestriction: null,
     +mCascadingRestrictions: null,
     +mCascadeSources: null,
     +mRestrictionsLoaded: false,
     +prefixedText: null,
     +mTitleProtection: null,
     +mDefaultNamespace: 0,
     +mRedirect: null,
   }
>>> $title->getFullURL()
=> "https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page"
>>> $title->getFullUrlForRedirect()
=> "https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page"
>>> $title->getLocalUrl()
=> "/wiki/Main_Page"
>>> $title->getLinkURL()
=> "/wiki/Main_Page"
>>> $title->getCanonicalURL()
=> "https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page"
>>> $title->getInternalURL()
=> "https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page"

But yet this seems to be an error within MediaWiki... Internal request from a beta appserver fails too, https://wikitech.wikimedia.org/wiki/Debugging_in_production#Debugging_a_web_request is the docs I used.

urbanecm@deployment-mediawiki-07:~$ curl -I -H 'Host: en.wikipedia.beta.wmflabs.org' "http://$(hostname -i)"
HTTP/1.1 301 Moved Permanently
Date: Fri, 06 Mar 2020 13:33:08 GMT
Server: deployment-mediawiki-07.deployment-prep.eqiad.wmflabs
X-Powered-By: PHP/7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1
X-Content-Type-Options: nosniff
P3P: CP="See https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAutoLogin/P3P for more info."
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie,Authorization
Expires: Fri, 06 Mar 2020 13:33:09 GMT
Cache-Control: private, must-revalidate, max-age=0
Last-Modified: Fri, 06 Mar 2020 13:33:09 GMT
Location: https://www.wikidata.org/wiki/Main_Page
Backend-Timing: D=26043 t=1583501588984933
Content-Type: text/html; charset=utf-8

Mentioned in SAL (#wikimedia-releng) [2020-03-06T13:50:32Z] <Urbanecm> Live debugging T247078 on deployment-mediawiki-07

Lucas_Werkmeister_WMDE renamed this task from Whole beta cluster is redirected to https://www.wikidata.org/wiki/Main_Page to Main pages of several Beta Cluster wikis redirect to other production wikis.Mar 6 2020, 1:52 PM

I was live-debugging this with enwiki beta. Few notes:

  • As-of includes/MediaWiki.php:123, $ret->getInterwiki() equals wikidata. That explains what happens with enwiki.
  • As-of includes/MediaWiki.php:116, $ret is null.

That leads me to the fact that Title::newMainPage() somehow returns something with wikidata. Can't reproduce that in shell through.

At includes/Title.php:668, $msg is 'Wikidata:Main page'. Can't reproduce in shell either.

What is changing the message run-time?

Tried to add $entry = null after includes/cache/MessageCache.php:1085 to stop all rogue cache keys from interfering. That fixed the immediate issue. Somehow, it seems the cache now has the correct value for enwiki. However, enwikisource is still broken. Leaving it in that state, so someone else can continue the investigation.

Mentioned in SAL (#wikimedia-releng) [2020-03-06T14:37:33Z] <Urbanecm> Live debugging T247078 on deployment-mediawiki-07 ended

So is it mediawiki or beta cluster infrastructure? It's apparently cache related?

I was live-debugging this with enwiki beta. Few notes:

  • As-of includes/MediaWiki.php:123, $ret->getInterwiki() equals wikidata. That explains what happens with enwiki.
  • As-of includes/MediaWiki.php:116, $ret is null.

That leads me to the fact that Title::newMainPage() somehow returns something with wikidata. Can't reproduce that in shell through.

At includes/Title.php:668, $msg is 'Wikidata:Main page'. Can't reproduce in shell either.

What is changing the message run-time?

Specific lines being referred to:

MediaWiki::parseTitle
[116]	// Use the main page as default title if nothing else has been provided
[117]	if ( $ret === null
[118]		&& strval( $title ) === ''
[119]		&& !$request->getCheck( 'curid' )
[120]		&& $action !== 'delete'
[121]	) {
[122]		$ret = Title::newMainPage();
[123]	}
Title::newMainPage
[664]	public static function newMainPage( MessageLocalizer $localizer = null ) {
[665]		if ( $localizer ) {
[666]			$msg = $localizer->msg( 'mainpage' );
[667]		} else {
[668]			$msg = wfMessage( 'mainpage' );
[669]		}
[670]
[671]		$title = self::newFromText( $msg->inContentLanguage()->text() );

Tried to add $entry = null after includes/cache/MessageCache.php:1085 to stop all rogue cache keys from interfering. That fixed the immediate issue. Somehow, it seems the cache now has the correct value for enwiki. However, enwikisource is still broken. Leaving it in that state, so someone else can continue the investigation.

MessageCache::getMsgFromNamespace
[1079]	public function getMsgFromNamespace( $title, $code ) {
[1080]		// Load all MediaWiki page definitions into cache. Note that individual keys
[1081]		// already loaded into cache during this request remain in the cache, which
[1082]		// includes the value of hook-defined messages.
[1083]		$this->load( $code );
[1084]
[1085]		$entry = $this->cache->getField( $code, $title );
[1086]
[1087]		if ( $entry !== null ) {
[1088]			// Message page exists as an override of a software messages
[1089-1102]			[...]
[1103]		} else {
[1104]			// Message page either does not exist or does not override a software message

@dom_walden you've noticed this yesterday, right?

I don't totally remember. A few days ago, I think...

Per https://wikidata.beta.wmflabs.org/wiki/Special:SiteMatrix:

"en" and "simple" content wikis + commons, deployment, meta and test: https://www.wikidata.org/wiki/Main_Page
"de" wiktionary: https://en.wikipedia.org/wiki/Hauptseite

nothing else fails currently

Current status: en, simple, commons, deployment, meta, and test all show their respective main pages correctly, but with the <title> “Wikinews, the free news source”. dewiktionary still redirects to prod enwiki Hauptseite. Haven’t tested the other sitematrix entries.

The beta cluster seems to get brokener and brokener. On en beta, the main page displays correctly and the “main page” link in the sidebar correctly links to the main page, but the logo links to production wikidata:Main Page, and https://en.wikipedia.beta.wmflabs.org/ (without /wiki/Main_Page) redirects there as well. Additionally, there seem to be cache-related problems (see T235208#5955695).

The beta cluster seems to get brokener and brokener. On en beta, the main page displays correctly and the “main page” link in the sidebar correctly links to the main page, but the logo links to production wikidata:Main Page, and https://en.wikipedia.beta.wmflabs.org/ (without /wiki/Main_Page) redirects there as well.

This was the first issue described in this task, it was partially fixed by @Urbanecm but it is broken again.

No, in the original state of this task the main page sidebar link was broken as well, at least as I remember it.

No, in the original state of this task the main page sidebar link was broken as well, at least as I remember it.

I see. The question is:

  • Is it just Beta Cluster internal corruption and can it be solved by some sort of hand-rebuilding?
  • Is it Beta Cluster update-mechanism bug and can it be solved by finding and fixing it?
  • Is it MW bug and can it be solved by finding and fixing it?

Additionally:

  • What caused this issue in the first place, when it happened and can it be reverted?

Is the sidebar and sometimes UI language switching to Simple English unrelated?

why is betacommons in simple English.png (754×924 px, 123 KB)

why is betacommons in simple English2.png (731×910 px, 94 KB)

T247695

Is the sidebar and sometimes UI language switching to Simple English unrelated?

why is betacommons in simple English.png (754×924 px, 123 KB)

why is betacommons in simple English2.png (731×910 px, 94 KB)

T247695

Hm.. It seems like a different issue, but the root cause might be the same for language switching and home page redirect issues.

So it looks like what is happening:

  • $wgMessageCacheType is set to CACHE_ACCEL on web but CACHE_NONE from cli (Note there is separate server cache and cluster cache. in beta wiki, they are set to both be APC. I didn't really test srvCache as i had it disabled in my testing, I'm not sure if it works properly or not. ClusterCache is definitely not working properly).
  • When the message cache is instantiated, the keyspace parameter is unset, so it becomes 'local'. The expected behaviour is that it is wfWikiID()
  • All the english language projects are using the same apc cache key, when they are not supposed to be. As a result, when cache rebuilding comes, the last project to be rebuilt (wikidata as alphabetically last) overrides all the other english project's cache keys.

My suggested fix: ObjectCache::getInstance() should always set keyspace to be wfWikiId() unless specificly overriden.

Bawolff renamed this task from Main pages of several Beta Cluster wikis redirect to other production wikis to Main pages of several Beta Cluster wikis redirect to other production wikis (MessageCache keyspace is same for all wikis causing conflicts).Mar 17 2020, 8:46 PM
Jdforrester-WMF assigned this task to Krinkle.
Jdforrester-WMF subscribed.

This appears to now be fixed.