Page MenuHomePhabricator

When editing a section on non-Latin content, Chrome redirects the user to Special:BadTitle on save
Closed, ResolvedPublic

Description

User:KOLI@fawiki informed me that after saving any edit, he get redirected to Special:BadTitle instead of the saved page. The edit is saved and can be found in Special:Contributions/History. This doesn't happen using VE (or editing through gadgets for obvious reasons).
This is the picture he sent me:

image.png (182×848 px, 45 KB)

(The translation of the mediawiki message is "Title of the requested page has bad character: Foo")

I can't reproduce it but since it happens for all pages when the user edits, it seems important to me.

Event Timeline

I used to have that also from time to time, I guess it will be nice if you could have a look at the logs of the page and when it is shown to the users, pretty sure you will find something interesting.

Change 496630 had a related patch set uploaded (by Ebrahim; owner: Ebrahim):
[mediawiki/core@master] Fix redirect to bad title on Chrome

https://gerrit.wikimedia.org/r/496630

I've uploaded a possible fix for it here https://gerrit.wikimedia.org/r/496630 but here is someway to reproduce it reliably,

Open https://test.wikipedia.org/w/index.php?title=Talk:%F0%9F%98%83&action=edit&section=1 and click save (a null edit).

If you are using Chrome, you will see Special:BadTitle after saving the page.

Krinkle renamed this task from User get redirects to Special:BadTitle after saving the edit to When editing a section sometimes Special:BadTitle is shown after saving.Mar 14 2019, 11:00 PM

Ladsgroup, you were also unable to see the issue? I see it on EVERY section edit in Persian Wikipedia regardless of being logged in or not and one should be able to see it on editing https://test.wikipedia.org/w/index.php?title=Talk:%F0%9F%98%83&action=edit&section=1 or using this

  1. $ echo "<?php header('Location: https://fa.wikipedia.org/wiki/%D8%A8#م');" > a.php
  2. $ php -S 127.0.0.1:2040 a.php
  3. Opening 127.0.0.1:2040

steps on Chrome at least on Linux and Windows but @MaxSem and @Krinkle expressing they are unable to see the issue. The patch needs work but how to proceed when its existence is in doubt? Extracting log count of users has hit the issue to find out the issue is real?

Is this the same issue as T218310 and T218393 ? If it is please merge them into T216029.

Apparently I don't have the right to mark as duplicate.

Great to know this patch on Chromium is related to this issue, https://chromium-review.googlesource.com/c/chromium/src/+/1366759

continue from https://phabricator.wikimedia.org/T218393

To be as clear as possible (if anyone needs to reproduce the bug).

  1. it must be fresh Chrome browser
  2. it must be URL with non-ASCII characters (that need to be escaped)
  3. it must be an edit of a section with non-ASCII characters in its title.

Then doing any edit of that section, even null edit, getting a redirect with a cocktail in it like %D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA#Якорь (half non-ASCII encoded, half left as-is)

Most browsers are able to deal with it as a part of their IDN homograph attack readiness. Now (see the dup link at the beginning) Chrome will restore this part of protection as well. Nevertheless such Location it is plain wrong by RFC - by the common sense as well though. So there is no guarantee of any kind that some browser on WebKit, or any other browser in the future will not crash in the same way.

So this header trick should be never ever used again: even after confirmed fix by Chromium developers. IMHO.

Other related bug

It would be more proper to file it separately - but while the attention is on the "type 25 pollution" bug, - it's time maybe to fix the long lasting "active inactive link" bug on Wikipedia. Thus you can click on it, but the anchor (fragment) part is ignored. I am 99% sure now that is another effect of the same problem.

A sample can be seen at ru-wiki techforum https://ru.wikipedia.org/wiki/Википедия:Форум/Технический

In the content list click any title - it will scroll into view. Now click the title` Запрашиваемое название страницы содержит недопустимые символы: «%D0»` (currently the 3rd from the top) - no scroll, but the address bar gets filled with escaped chars. This is for any browser, not Chrome only.

@Aklapper: Thanks for the link. I believe it has not the best UX but well I see the reason, so.

@Neolexx: The source code is messed up already due to requirements but great for bringing it up. Added that testcase here https://test.wikipedia.org/wiki/Talk:%F0%9F%98%83

gerrit is still dead https://www.isitdownrightnow.com/gerrit.wikimedia.org.html

I guess it means that no one can try or deploy the fix?

Also and just in case I will repeat myself (as old edits get hidden by default). "Type 25 pollution bug" is rather elusive. It is not necessarily shown even while editing pages with any complex encoding. To see it:

  1. it must be fresh Chrome browser
  2. it must be URL with non-ASCII characters
  3. it must be an edit of a section with non-ASCII characters in its title.

As a live demo case I am linking here the discussion on the subject at ru-wiki techforum.
https://ru.wikipedia.org/wiki/Википедия:Форум/Технический#Некорректное_сохранение_страницы
you try edit it - you get Type 25 bug

But the subsection of that discussion is named "human-readable section IDs" (no none-ASCII chars)
https://ru.wikipedia.org/wiki/Википедия:Форум/Технический#"human-readable_section_IDs"
you try to edit it - you will be just fine.

Going back to old fashion diff sharing, I am still not sure where to put urlencode and which one should I use, I only know this can work and I am OK if someone else also want to fix the issue but please fix ASAP now that it hits lot of users. Thanks

diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index 546152f6c9..d0ee01aed0 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -5984,7 +5984,7 @@ class Parser {
        }
 
        private static function makeAnchor( $sectionName ) {
-               return '#' . Sanitizer::escapeIdForLink( $sectionName );
+               return '#' . wfUrlencode( Sanitizer::escapeIdForLink( $sectionName ) );
        }
 
        private function makeLegacyAnchor( $sectionName ) {

Gerrit is down, folks are working on it, see T218472.

Even if it wasn't down, we don't do non-emergency deployments on weekends. Very few people are around to help if something goes wrong. I realise that this is an annoying problem, but the site is not down, you can even still save edits just fine, and you can also just use a different browser for a few days to avoid the problem.

I guess it means that no one can try or deploy the fix?

See https://wikitech.wikimedia.org/wiki/Deployments for the calendar; both normal and SWAT deploys do not happen on weekends.
Plus I do not share the implied urgency as the edits do get saved and there is no data loss. Patience, please.

Yeah, they do get saved: just nobody knows about it unless MediaWiki savvy users of Wikipedia. And it just happens in the project that the max article edits and newcomers falls at week-ends.

But patience then patience. Me personally added to my https://ru.wikipedia.org/wiki/Участник:Neolexx/common.js that marvelous "Type 25 pollution cleaner" and I'm almost used already to the screen flashed

$(document).ready(function() {
	if (mw.config.get('wgAction') == 'view') {
		var foo = document.getElementById('mw-content-text');
		if ( /^Запрашиваемое название страницы содержит недопустимые символы/.test(foo.textContent) ) {
			var bar = self.location.href.replace(/\%25/g, '%');
			self.location.href = bar;
		}
	}
});

Maybe as a hot week-end fix we could change the Special:BadTitle page content. So it would infrm that the edit is saved. Plus a code like atop suggestion to add to personal common.js For a while.

Maybe as a hot week-end fix we could change the Special:BadTitle page content. So it would infrm that the edit is saved.

That actually sounds like a good idea. Any ru.wp administrator can edit the text of this message at https://ru.wikipedia.org/wiki/MediaWiki:Title-invalid-characters. (Just remember to delete it later when we actually fix this.)

So, we are already special casing Location: headers for Microsoft browsers here, we can add URL-escaping for Chrome in the same place. I'd like to avoid urlencoding the links if possible because that would mean pretty fragments will be broken on MS browsers (sigh, still about 8% of our traffic). Please voice your objections before Monday morning, I'll make a patch then.

"Don't Shoot the Pianist"... I'm not too deep in IETF and WHATWG stuff so just quoting verbatim from the Chromium response (at the bottom after the line).
So as I understand - they have a word on it from specs guys, URI like urlencoded_part#non-ASCII_as-is_part are simply wrong. They will restore the fix in the next update. But the specs do not prevent any browser producer in the future simply reject such URI as malformed and possibly malicious. If my understanding is correct then such risk should be definitely accounted by the Wikimedia team.


On the spec side, the IETF land says the ABNF for Location in RFC 7231 says the Location header should be a URI-reference:
https://tools.ietf.org/html/rfc7231#section-7.1.2
That and its dependencies are defined here:
https://tools.ietf.org/html/rfc3986#appendix-A

Note in particular:

fragment      = *( pchar / "/" / "?" 
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

Thus, according to IETF specs, that location header is invalid. IETF specs, however, don't do a good job of specifying error-reporting. WHATWG specs sometimes do better. They say...

https://fetch.spec.whatwg.org/#http-fetch (step 3)
We should be "extracting header list values given Location", then passing that straight to the URL parser.
https://fetch.spec.whatwg.org/#extract-header-list-values
"extracting header list values" seems to be defined just in turns of the corresponding ABNF and otherwise return failure, so I guess WHATWG thinks we should be rejecting that Location header. But neither Firefox nor old Chrome reports an error there, so I think this is a spec bug.

As far as I understand the issue, I agree with Neolexx regarding the conditions for faillure: chrome, "es:Río Ibáñez" and subsection "Véase también" (engl. for "See also"). No changes need to be made, to commit the unchanged data leads to fail.

The only new fact I can contribute is that the link to the section "Véase también" in the TOC (Table of Content) of the article doesn't work. It is dead.

Thank you for working to resolve the problem.

You may laugh but we chose to not urlencode the fragments precisely because of Chrome - that's how it worked at the time ;)

To really laugh one may enjoy my "Type 25 pollution cleaner" I'm keep using these days
https://ru.wikipedia.org/wiki/Участник:Neolexx/common.js

`if ( /\%25\w\w\%/.test(self.location.href) ) {
self.location.href = self.location.href.replace(/\%25(\w\w)/g, '%$1');
}`

As well the idea - that looked almost like a good idea - at some point at this week-end: to attach this script to the Special:BadTitle page to get them all of there.

After that I am overall philosophically indifferent by now, taking whatever comes in the order it comes. And thank you for fixing it.

Change 497226 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@master] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497226

Jdforrester-WMF renamed this task from When editing a section sometimes Special:BadTitle is shown after saving to When editing a section on non-Latin content, Chrome redirects the user to Special:BadTitle on save.Mar 18 2019, 4:15 PM
Jdforrester-WMF raised the priority of this task from High to Unbreak Now!.

Change 497226 merged by jenkins-bot:
[mediawiki/core@master] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497226

Change 497347 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@wmf/1.33.0-wmf.21] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497347

Change 497364 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@REL1_32] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497364

Change 497347 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.21] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497347

The fix has been deployed and fixes the issue as reported. I'm however not closing the bug because we need to make sure that the issue doesn't surface via some other path and because we need to figure out if we should just move to always urlencode the fragments.

"es:Río Biobío#Véase también" can be edited and commited without problemas. Thanks, really thank you very much.

Jdforrester-WMF lowered the priority of this task from Unbreak Now! to High.Mar 25 2019, 8:10 PM
Jdforrester-WMF subscribed.

Downgrading, at least.

Change 497364 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@REL1_32] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497364

Change 499041 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@REL1_31] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/499041

Change 499041 merged by jenkins-bot:
[mediawiki/core@REL1_31] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/499041

Change 497364 merged by jenkins-bot:
[mediawiki/core@REL1_32] Urlencode fragments when redirecting after editing

https://gerrit.wikimedia.org/r/497364

Change 496630 abandoned by Ebrahim:
parser: Change Sanitizer to urlencode escaped section IDs

Reason:
Yep

https://gerrit.wikimedia.org/r/496630

The fix has been deployed and fixes the issue as reported. I'm however not closing the bug because we need to make sure that the issue doesn't surface via some other path and because we need to figure out if we should just move to always urlencode the fragments.

Has anything been done towards this?

I don't think it makes sense to keep this open then. Presumably someone would have noticed by now if this was still a problem in another case. Thank you for fixing it!