Page MenuHomePhabricator

wgRelevantPageName is missing on Chinese Wikipedia
Closed, ResolvedPublic

Description

https://www.mediawiki.org/wiki/Talk:Sandbox?action=history has wgRelevantPageName, but https://zh.wikipedia.org/wiki/Wikipedia_talk:Flow_tests?action=history does not.

It's a PHP issue (it's in the HTML source for one, but not the other).

This breaks the history page (at least) due to VE requiring it.

Event Timeline

WTF, there is a whole lot of stuff missing from view-source:https://zh.wikipedia.org/wiki/Wikipedia_talk:Flow_tests?action=history . The entire blob with exported wg vars isn't there.

Change 249022 had a related patch set uploaded (by Catrope):
DesktopArticleTarget.init: Tolerate missing wgRelevantPageName

https://gerrit.wikimedia.org/r/249022

Change 249040 had a related patch set uploaded (by Krinkle):
DesktopArticleTarget.init: Tolerate missing wgRelevantPageName

https://gerrit.wikimedia.org/r/249040

Change 249022 merged by jenkins-bot:
DesktopArticleTarget.init: Tolerate missing wgRelevantPageName

https://gerrit.wikimedia.org/r/249022

Change 249040 merged by jenkins-bot:
DesktopArticleTarget.init: Tolerate missing wgRelevantPageName

https://gerrit.wikimedia.org/r/249040

This appears to happen because we use non-multibyte-aware truncation for summaries (I think this is a revision summary, not a topic summary, but I'm not 100% sure), so we end truncate the summary halfway through a Unicode codepoint and add "..." to it. This sometimes produces invalid Unicode sequences.

This invalid Unicode sequence ends up in the wgFlowData variable that's exported to JavaScript, and causes the json_encode() to barf. So json_encode( $this->getJSVars() ) returns false, but that turns into an empty string when other things are concatenated to it, so the result is that the entire JSVars blob is just dropped on the floor.

The revision that breaks this particular page is sqs6skdk7uu952ab , I'll see if I can fix that one in the DB and also fix the truncation code itself.

...but somehow the API does manage to output JSON which contains this invalid Unicode, and it gets replaced with \ufffd: https://zh.wikipedia.org/w/api.php?action=flow&submodule=view-topic-history&page=Topic:Sqs6skdav48d3xzn&vthformat=wikitext&format=json

...which happens because ApiResult runs everything through Language::normalize(), which cleans this up.

It looks like the "summary" isn't in the DB at all, but is generated at view time by calling Flow\Parsoid\Utils::htmlToPlaintext(), which calls Language::truncate(). That's the core utility for truncating strings in a Unicode-aware way, so I'd be somewhat surprised if there was a bug in that function,

Change 249050 had a related patch set uploaded (by Catrope):
Language::truncate(): don't chop up multibyte characters when input contains newlines

https://gerrit.wikimedia.org/r/249050

Change 249051 had a related patch set uploaded (by Krinkle):
Language::truncate(): don't chop up multibyte characters when input contains newlines

https://gerrit.wikimedia.org/r/249051

Change 249050 merged by jenkins-bot:
Language::truncate(): don't chop up multibyte characters when input contains newlines

https://gerrit.wikimedia.org/r/249050

Change 249051 merged by jenkins-bot:
Language::truncate(): don't chop up multibyte characters when input contains newlines

https://gerrit.wikimedia.org/r/249051

Krinkle subscribed.

Checked in beta - topic titles, summaries with CJK and updates to them displayed correctly in History.

Screen Shot 2015-10-30 at 5.23.23 PM.png (463×1 px, 232 KB)