Page MenuHomePhabricator

Fatal on some Special:MyLanguage urls: MWException "Can't determine talk page associated with interwiki link"
Closed, ResolvedPublic

Description

Error

Request URL: https://www.mediawiki.org/wiki/Special:MyLanguage/Wikipedia:Copyrights
Request ID: XSYCOgpAAEYAAB1609YAAAAG

message
MWException: Can't determine talk page associated with interwiki link
trace
#0 /srv/mediawiki/php-1.34.0-wmf.13/includes/Title.php(1549): NamespaceInfo->getTalkPage(Title)
#1 /srv/mediawiki/php-1.34.0-wmf.13/includes/skins/Skin.php(282): Title->getTalkPage()
#2 /srv/mediawiki/php-1.34.0-wmf.13/includes/skins/Skin.php(164): Skin->preloadExistence()
#3 /srv/mediawiki/php-1.34.0-wmf.13/skins/Vector/includes/SkinVector.php(68): Skin->initPage(OutputPage)
#4 /srv/mediawiki/php-1.34.0-wmf.13/includes/skins/SkinTemplate.php(216): SkinVector->initPage(OutputPage)
#5 /srv/mediawiki/php-1.34.0-wmf.13/includes/OutputPage.php(2580): SkinTemplate->outputPage()
#6 /srv/mediawiki/php-1.34.0-wmf.13/includes/MediaWiki.php(891): OutputPage->output(boolean)
#7 /srv/mediawiki/php-1.34.0-wmf.13/includes/MediaWiki.php(903): MediaWiki->{closure}()
#8 /srv/mediawiki/php-1.34.0-wmf.13/includes/MediaWiki.php(515): MediaWiki->main()
#9 /srv/mediawiki/php-1.34.0-wmf.13/index.php(42): MediaWiki->run()
#10 /srv/mediawiki/w/index.php(3): require(string)

Impact

Raised rates of HTTP 500 and MediaWiki exceptions.

Use sees:

Notes

New in 1.34-wmf.13. The same url returns non-fatal on wikis running the previous deployment version, e.g. https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikipedia:Copyrights shows the red-link page for a page that could be named "Wikipedia:Copyrights".

Although it is also broken because this url either redirect or display a form on the special page. Instead, it is becoming an illegal way to view a page at a strange url. In particular, because it not a valid title to create. Because viewing the page normally would redirect to en.wikipedia.org. This should probably display a (localised) error message instead on an HTTP 40xx response, but returning an uncached HTTP 50x response is a regression.

This is very likely caused by the Title/NamespaceInfo refactoring, as such tagging Core Platform and CC-ing @daniel.

Event Timeline

Krinkle created this task.Jul 10 2019, 6:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2019, 6:15 PM
Jdforrester-WMF triaged this task as Unbreak Now! priority.Jul 10 2019, 8:51 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 10 2019, 8:51 PM

A few thoughts from a quick look - this is the result of two or three cascading failures.

  1. The last one is from the Skin layer, which seems fine. The Skin system doesn't need to support rendering articles with invalid titles because those can never exist, and the Skin layer is protected from such titles by MediaWiki.php and ViewAction.php (by rewriting these to Special:Badtitle instead).
  2. The one up from that, is that somehow the RedirectSpecialPage (as used by SpecialMyLanguage), is sometimes failing to emit a redirect. Instead, sometimes doing something that should be impossible, which is that it is invoking ViewAction (how??) with the result of the "get redirect destination" logic. Thus resulting in the impossible thing of "interwiki: Title" actually running through the Skin system.
  3. (unrelated) Another problem somewhere in the middle of all this is the weird behaviour in Title::getSubpage() as used by SpecialMyLanguage.php, which is returning a completely unrelated result when you start with an interwiki title. E.g. Title::newFromText("meta:Sandbox")->getSubpage('x') == "Sandbox/x". It lost the interwiki prefix and thus causes a caller to sometimes act very different than intended. For example <https://www.mediawiki.org/wiki/Special:MyLanguage/meta:Developer account?uselang=nl> results in a redirect to https://www.mediawiki.org/wiki/Developer_account/nl because that title exists locally. Whereas https://www.mediawiki.org/wiki/Special:MyLanguage/meta:Developer%20account?uselang=nl-nds results in the same fatal exception on 1.34-wmf.13 as the one in the task description.
  4. Lastly, issue that the most immediate cause logically, is that SpecialMyLanguage probably just needs to reject anything that isExternal and fallback the same way it does already for invalid title, which is to forward to the Main Page instead.

Resolving the last point might be the simplest fix for this issue, but point 2 and 3 are also problems waiting to happen.

Thanks for the analysis, @Krinkle! Looks like the Title/NamespaceInfo refactoring is flushing out more places were we inadvertently use invalid titles or make unwarranted assumptions about titles.

@Jdforrester-WMF you tagged this as UBN - is this so severe that it can't wait a week to be fixed? From the description, it seems like we are now seeing a 500 in a rare situation that previously did something unexpected/undefined. To me, that sounds like "high" prio, not UBN. Am I missing something?

Thanks for the analysis, @Krinkle! Looks like the Title/NamespaceInfo refactoring is flushing out more places were we inadvertently use invalid titles or make unwarranted assumptions about titles.
@Jdforrester-WMF you tagged this as UBN - is this so severe that it can't wait a week to be fixed? From the description, it seems like we are now seeing a 500 in a rare situation that previously did something unexpected/undefined. To me, that sounds like "high" prio, not UBN. Am I missing something?

The reason that it's UBN is because it's a train blocker and they all are UBN. Whether this should be considered as train blocker or not. I don't know. (Just case it's not clear why it's UBN)

The reason that it's UBN is because it's a train blocker and they all are UBN. Whether this should be considered as train blocker or not. I don't know. (Just case it's not clear why it's UBN)

Oh, right, of course. @Krinkle, why do you consider this a train blocker? I mean, it should be fixed for sure, but does it need to block the train? As far as I can see, a 500 can now be triggered by providing an invalid title in a URL parameter - not great, but we are not losing any useful functionality, and the error itself should be rare if nobody is actively trying to trigger it. Am I missing something?

Krinkle added a comment.EditedJul 11 2019, 2:22 PM

From the description, it seems like we are now seeing a 500 in a rare situation that previously did something unexpected [..]

I file new exceptions and errors that are high-confident regressions are train blockers by default, until or unless decided otherwise by those that investigate (e.g. you), or the product owners affected by the issue.

In this case:

  • We know the problem affects a few urls, but that's based on only a few hours of group0 traffic. We have yet to learn whether this will affect more features on more wikis for more users, or not.
  • This exposes yet another endpoint at a public GET url that consistently bypasses caches, results in a MW fatal exception, and an HTTP 50x response. With every little bug that adds to this pile, we further decrease our ability to bigger regressions as it means we keep increasing our threshold. It also means we further increase the chances of false alarms and late-night pages that alert SRE and/or recent deployers to a potential on-going Varnish or MediaWiki outage.

This last point applies regardless of whether we care about the feature in question, because all other teams pay for it by having less ability to discover problems they would care about. Blocking the train is my only ability to ensure this does not happen. But if you are confident it will get fixed in the last 7 days, I don't see a reason to block the train today.

But if you are confident it will get fixed in the last 7 days, I don't see a reason to block the train today.

If it has not happened in a week, complain to me and I'll fix it right away.

To give some context: CPT is trying to establish a process for handling these kind of requests, which also involves triaging them. One thing that is still unclear is if and when it's ok for us to re-triage incoming UBNs. I think the promise would be that we'd handle incoming "high" prio tickets within a week, ideally befor the next deployment branch. While UBN should be fixed within the day. That's at least my goal, we are still in the process of sorting this out.

greg added a subscriber: greg.Jul 11 2019, 10:42 PM

But if you are confident it will get fixed in the last 7 days, I don't see a reason to block the train today.

If it has not happened in a week, complain to me and I'll fix it right away.

Moving this to block next week's train then :)

Now that it is rolled out more widely, seeing the exception for various other inputs as well. Including normal-looking URLs that I expect to have previously worked fine and don't appear to have been broken at all or otherwise odd.

Such as:

Such as:

Right. That does sound like UBN. I'll look into it.

Analysis:

meta:MediaWiki Stakeholders' Group is an interwiki redirect:

#REDIRECT [[mw:MediaWiki_Stakeholders%27_Group]]

Special:MyLanguage will first resolve the redirect, and then it will check whether the target language is the content language. If that is the case, all logic involving subpages is skipped, and the title is returned as the target title. If this title has an interwiki prefix, from a redirect or because it was specified that way in the url, this is now passed back to the caller as the target title.

The caller is MediaWiki::performRequest(), which does have handling for interwiki redirects, but is already past that branch. The return value from RedirectSpecialPage::getRedirect() goes straight into $wgTitle, and thus gets processed by the Skin. The Skin then tries to determine the associated talk page.

The problem may not have been obvious in the past, but it sure doesn't make sense... MediaWiki::performRequest() should somehow safeguard against this, and Special:MyLanguage should not try to follow interwiki redirects.

Change 522401 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] SpecialRedirectPage: handle interwiki redirects.

https://gerrit.wikimedia.org/r/522401

Change 522402 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Title::getSubpage should not lose the interwiki prefix.

https://gerrit.wikimedia.org/r/522402

Change 522401 merged by jenkins-bot:
[mediawiki/core@master] RedirectSpecialPage: handle interwiki redirects.

https://gerrit.wikimedia.org/r/522401

The merged patch, 522401, should address the critical problem. Should probably be backported and go on SWAT.

522402 should also be merged to ensure consistent behavior of interwiki prefixes with Special:MyLanguage, but that doesn't need to be UBN, and doesn't need to block the train either, I think.

Change 522558 had a related patch set uploaded (by Jforrester; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.34.0-wmf.13] RedirectSpecialPage: handle interwiki redirects.

https://gerrit.wikimedia.org/r/522558

Change 522608 had a related patch set uploaded (by Krinkle; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.34.0-wmf.13] Title: Title::getSubpage should not lose the interwiki prefix

https://gerrit.wikimedia.org/r/522608

Change 522402 merged by jenkins-bot:
[mediawiki/core@master] Title: Title::getSubpage should not lose the interwiki prefix

https://gerrit.wikimedia.org/r/522402

Change 522558 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] RedirectSpecialPage: handle interwiki redirects.

https://gerrit.wikimedia.org/r/522558

@daniel Unfortunately, this didn't get through SWAT, we now get a 200 but also a blank page. @Krinkle also spotted another issues that he will add.

@tstarling or @Anomie would you have time to look at this?

Change 522558 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] RedirectSpecialPage: handle interwiki redirects.
https://gerrit.wikimedia.org/r/522558

This was reverted by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/523251.

@daniel Unfortunately, this didn't get through SWAT, we now get a 200 but also a blank page. @Krinkle also spotted another issues that he will add.

That issue was logstash doesn't show any MW logs, see P8748 (first time this issue was mentioned AFAICS) for relevant conversation from #wikimedia-operations.

Unfortunately, this didn't get through SWAT, we now get a 200 but also a blank page. @Krinkle also spotted another issues that he will add.

The other issue being that Logstash isn't working for MediaWiki (T228089), which is also why we couldn't determine what exception caused the blank page.

Change 522608 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] Title: Title::getSubpage should not lose the interwiki prefix

https://gerrit.wikimedia.org/r/522608

Change 522558 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] RedirectSpecialPage: handle interwiki redirects.
https://gerrit.wikimedia.org/r/522558

In testing this patch on mwdebug1002 I found the same as @WDoranWMF did. An entirely blank response with HTTP 200 status code. And even with Logstash back up, no error/exception/fatal logged.

With debug logging enabled, I find:

Start request GET /wiki/Special:MyLanguage/Wikipedia:Copyrights
Wikimedia\Rdbms\LoadMonitor (…)
Wikimedia\Rdbms\ChronologyProtector (…)
Wikimedia\Rdbms\LoadBalancer (…)
…
MediaWiki\Interwiki\ClassicInterwikiLookup::getInterwikiCacheEntry( wikipedia )
ParserFactory: using preprocessor: Preprocessor_Hash
Unstubbing $wgLang on call of $wgLang::_unstub from ParserOptions->__construct
MediaWiki::preOutputCommit: primary transaction round committed
MediaWiki::preOutputCommit: pre-send deferred updates completed
MediaWiki::preOutputCommit: session changes committed
OutputPage::sendCacheControl: private caching; Mon, 15 Jul 2019 20:26:38 GMT **
Request ended normally
Saving all sessions on shutdown

Nothing out of the ordinary, except that it ended surprisingly soon.

Looking at the patch again (https://gerrit.wikimedia.org/r/522401) in master:

  • It changes the title parsing and normalisation code that handles RedirectSpecialPage, to silently rewrite the destination title as Special:GoToInterwiki if it's an interwiki title (Title:isExternal).
  • The code then does a bunch of set up after which it continues onto the normal branches toward the bottom of MediaWiki::performRequest.
  • $title->isSpecialPage() will be true, with $title being Special:GoToInterwiki/wikipedia:Copyrights.
  • SpecialPageFactory::executePath() -> SpecialPage::run -> SpecialGoToInterwiki::execute()
  • We then reach this interesting code path:
SpecialGoToInterwiki.php
		if ( !$target->isExternal() || $target->isLocal() ) {
			// Either a normal page, or a local interwiki. Just redirect.
			$this->getOutput()->redirect( $url, '301' );
		} else {
			$this->getOutput()->addWikiMsg( 'gotointerwiki-external', $url, $target->getFullText() );
		}

In other words, if this is a non-interwiki title or a local in-farm interwiki link, then we redirect normally, over HTTP. In case of truly external links (not in-farm interwikis), we render an interstitial page. This exists for security reasons and is indeed the primary reason for the existence of this GoToInterwiki mechanism (without this mechanism an HTTP redirect is MediaWiki's default behaviour. The special page exists to stop this behaviour for untrusted destinations. For context, see 14beae88b / T109140 / T122209).

The issue however is that in the common case of SpecialGoToInterwiki deciding to redirect over HTTP. We now have a problem, because we have already gone passed the code in MediaWiki.php that deals with HTTP redirects, so the information gets ignored, and it doesn't do anything else, hence an empty page.

If we test instead a case of an unstrusted prefix, such as "google:" then we find the patch actually does fix it kind of. E.g. https://www.mediawiki.org/wiki/Special:MyLanguage/google:Wikipedia:Copyrights fatals today in production, but renders as https://www.mediawiki.org/wiki/Special:GoToInterwiki/google:Wikipedia:Copyrights with this patch applied.

As written, the code is still broken (to some extent worse because we now have silent failure and invalid/unpredictable behaviour in master). So this will need additional research and a new patch to be written.

The secondary patch that cleans up Title::getSubpage seems harmless indeed and I'll roll that out momentarily.

Mentioned in SAL (#wikimedia-operations) [2019-07-15T20:59:41Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.13/includes/Title.php: T227700 / T227700: getSubpage should not lose the interwiki prefix (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2019-07-15T20:59:41Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.13/includes/Title.php: T227700 / T227700: getSubpage should not lose the interwiki prefix (duration: 00m 52s)

What is the status of this ticket? Is the fix merged and ready for 1.34-14, this week's train? As-is, it seems train can't move beyond cutting the branch.

What is the status of this ticket? Is the fix merged and ready for 1.34-14, this week's train? As-is, it seems train can't move beyond cutting the branch.

Patches for High and UBN tickets are usually backported and deployed during a SWAT window that happens three times a workday (except Fridays). We tried to deploy the fix for this ticket yesterday, however, the fix didn't fix the bug (instead of a fatal error, a blank page was displayed, with no fatal/exception/error [even in Logstash]) and the deployment was aborted, but the fix was left in master. This is still under investigation.

Indeed. A new patch is still needed, and we should consider possibly reverting the patch that did land in master, from master and the new branch (https://gerrit.wikimedia.org/r/522401).

Given this already affected production for a week, and CPT is working on it with high priority, I propose we move forward so long as we keep it the same as it was last week (so with the above patch, which would cause additional issues, to be backed out).

Change 523693 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Revert "RedirectSpecialPage: handle interwiki redirects."

https://gerrit.wikimedia.org/r/523693

Change 523695 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.34.0-wmf.14] Revert "RedirectSpecialPage: handle interwiki redirects."

https://gerrit.wikimedia.org/r/523695

Change 523693 merged by jenkins-bot:
[mediawiki/core@master] Revert "RedirectSpecialPage: handle interwiki redirects."

https://gerrit.wikimedia.org/r/523693

Change 523695 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.14] Revert "RedirectSpecialPage: handle interwiki redirects."

https://gerrit.wikimedia.org/r/523695

Tgr added a subscriber: Tgr.EditedJul 16 2019, 1:12 PM

In other words, if this is a non-interwiki title or a local in-farm interwiki link, then we redirect normally, over HTTP. In case of truly external links (not in-farm interwikis), we render an interstitial page.

Sorry, I overlooked that. In this case it would defeat the point as we do need the interstitial for security reasons even on local interwiki redirects if they contain something personal (e.g. Special:MyPage/unique-token and the user's page is a local interwiki redirect).

One option would be to add some parameter to Special:GoToInterwiki to force it to never redirect. Another would be to refuse processing interwiki redirects in RedirectSpecialPage pages which have personally identifiable pages (just show the page as if redirect=no was used, which also works as an interstitial of sorts).

The first seems easier to implement, so I would go with that, at least for the UBN fix.

Change 523713 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@wmf/1.34.0-wmf.14] [WIP] Fix and re-apply "RedirectSpecialPage: handle interwiki redirects"

https://gerrit.wikimedia.org/r/523713

Change 523721 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] [WIP] Fix and re-apply "RedirectSpecialPage: handle interwiki redirects"

https://gerrit.wikimedia.org/r/523721

Change 523713 abandoned by Gergő Tisza:
[WIP] Fix and re-apply "RedirectSpecialPage: handle interwiki redirects"

Reason:
Sorry, wrong branch. See https://gerrit.wikimedia.org/r/c/mediawiki/core/ /523721 .

https://gerrit.wikimedia.org/r/523713

@Tgr Are you actively looking at the issue?

Tgr added a comment.Jul 16 2019, 4:01 PM

I have a patch up (needs tests, will try to add some later today - although testing the MediaWiki class is not fun; it really should be refactored into something manageable).

@Tgr Ok thanks, we'll wait then to see if that resolves the issue. Let me know if you need anything from us.

Change 523721 merged by jenkins-bot:
[mediawiki/core@master] Fix and re-apply "RedirectSpecialPage: handle interwiki redirects"

https://gerrit.wikimedia.org/r/523721

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:42:07Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:45:20Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:46:15Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:49:10Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:50:04Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:51:29Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-24T04:52:24Z] <tstarling@deploy1001> Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 54s)

tstarling closed this task as Resolved.Jul 24 2019, 4:54 AM
tstarling claimed this task.

Deployed, and confirmed that the exception goes away.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:05 PM