Page MenuHomePhabricator

Cannot start or resume a translation for articles with spaces or non-ascii characters in the title
Closed, ResolvedPublic

Description

Trying to start a translation for certain articles results in the dashboard to be reloaded instead of reaching the translation editor. After an initial exploration the problem seems to occur with articles with spaces in their name. For example, Palak paneer cannot be translated. Clicking on "Start translation" just reloads the page.


This issue was reported in this talk page discussion


See also screencast (from Portuguese Wikipedia):

Event Timeline

I cannot reproduce locally. Looking at the code:

	private function hasValidToken() {
		global $wgContentTranslationTranslateInTarget;
		$request = $this->getRequest();
		if ( $this->getUser()->isAnon() ) {
			// Tokens are valid only for logged in users.
			return false;
		}
		$title = $request->getVal( 'page' );
		if ( $title === null ) {
			return false;
		}
		// PHP mangles spaces so that foo%20bar is converted to foo_bar and that $_COOKIE['foo bar']
		// *does not* work. Go figure. It also mangles periods, so that foo.bar is converted to
		// foo_bar, but that *does* work because MediaWiki's getCookie transparently maps periods to
		// underscores. If there is any further bugs reported about this, please use base64.
		$title = strtr( $title, ' ', '_' );
		$from = $request->getVal( 'from' );
		$to = $request->getVal( 'to' );
		if ( $from === null || $to === null ) {
			return false;
		}
		$cookieName = implode( '_', [ 'cx', $title, $from, $to ] );
		$hasToken = $request->getCookie( $cookieName, '' ) !== null;
		// Since we can only publish to the current wiki, enforce that the target language matches
		// the wiki we are currently on. If not, redirect the user back to dashboard, where he can
		// start again with parameters filled (and redirected to the correct wiki).
		if ( $wgContentTranslationTranslateInTarget ) {
			$tokenIsValid = $to === SiteMapper::getCurrentLanguageCode();
			return $hasToken && $tokenIsValid;
		}
		// For development (single instance) use, there is no need to validate the token, because
		// we don't redirect.
		return $hasToken;
	}

We show dashboard if this method returns false. I am fairly confident it the problematic return is return $hasToken && $tokenIsValid;, and SiteMapper::getCurrentLanguageCode() seems to work on as expected from command line. This leaves something odd happening with $request->getCookie() or SiteMapper::getCurrentLanguageCode() misbehaving during web requests.

Actually, since articles without spaces in the name work, it's probably change in what the comment says about %20 being converted to _ not being true in certain environments.

Probable cause is https://www.php.net/ChangeLog-7.php#PHP_7_2

SAL 2021-04-21: 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34

Change 683260 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@master] Fix CX token cookie

https://gerrit.wikimedia.org/r/683260

Nikerabbit renamed this task from Cannot start translation for articles with space in the title to Cannot start or resume a translation for articles with space in the title.Apr 28 2021, 11:44 AM
Nikerabbit added a subscriber: He7d3r.

Change 683134 had a related patch set uploaded (by KartikMistry; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.1] Fix CX token cookie

https://gerrit.wikimedia.org/r/683134

Change 683135 had a related patch set uploaded (by KartikMistry; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.3] Fix CX token cookie

https://gerrit.wikimedia.org/r/683135

Change 683265 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/extensions/ContentTranslation@master] Update the comment about cookie name in CX token validation

https://gerrit.wikimedia.org/r/683265

Change 683260 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Fix CX token cookie

https://gerrit.wikimedia.org/r/683260

I tried testing on cx2 now and is still not working.
waiting for the job that sends this to cx2 to update the code.

I tried testing on cx2 now and is still not working.
waiting for the job that sends this to cx2 to update the code.

https://cx2-testing.wmflabs.org/index.php/Special:Version seems at this fix. Can you try again?

Nikerabbit renamed this task from Cannot start or resume a translation for articles with space in the title to Cannot start or resume a translation for articles with spaces or special characters in the title.Apr 28 2021, 3:29 PM

I tried testing on cx2 now and is still not working.
waiting for the job that sends this to cx2 to update the code.

https://cx2-testing.wmflabs.org/index.php/Special:Version seems at this fix. Can you try again?

It's not working for the article "Administración"
https://es.wikipedia.org/wiki/Administraci%C3%B3n

Nikerabbit renamed this task from Cannot start or resume a translation for articles with spaces or special characters in the title to Cannot start or resume a translation for articles with spaces or non-ascii characters in the title.Apr 29 2021, 7:09 AM

Change 683514 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@master] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683514

Change 683559 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/extensions/ContentTranslation@master] CX Token: Use base64 encoding to avoid PHP cookie name mangling issues

https://gerrit.wikimedia.org/r/683559

Change 683514 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683514

Change 683533 had a related patch set uploaded (by Ladsgroup; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.3] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683533

Change 683534 had a related patch set uploaded (by Ladsgroup; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.1] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683534

Change 683135 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.3] Fix CX token cookie

https://gerrit.wikimedia.org/r/683135

Change 683134 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.1] Fix CX token cookie

https://gerrit.wikimedia.org/r/683134

Mentioned in SAL (#wikimedia-operations) [2021-04-29T10:54:49Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683134|Fix CX token cookie (T281346)]] (duration: 01m 09s)

Mentioned in SAL (#wikimedia-operations) [2021-04-29T10:56:25Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683135|Fix CX token cookie (T281346)]] (duration: 01m 08s)

Change 683534 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.1] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683534

Change 683533 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.37.0-wmf.3] Another fix for token cookie handling

https://gerrit.wikimedia.org/r/683533

Mentioned in SAL (#wikimedia-operations) [2021-04-29T11:32:50Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533|Another fix for token cookie handling (T281346)]] (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2021-04-29T11:34:34Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534|Another fix for token cookie handling (T281346)]] (duration: 01m 07s)

The above fixes do fix this for many cases, but there are still some titles which are not working.

Change 683265 abandoned by Santhosh:

[mediawiki/extensions/ContentTranslation@master] Update the comment about cookie name in CX token validation

Reason:

no longer relevant

https://gerrit.wikimedia.org/r/683265

Change 683559 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX Token: Use base64 encoding to avoid PHP cookie name mangling issues

https://gerrit.wikimedia.org/r/683559

Nikerabbit lowered the priority of this task from Unbreak Now! to High.
Nikerabbit removed a project: Patch-For-Review.

tried with spaces and diacritics and it seems to be working fine now