Page MenuHomePhabricator

Invalid location header per RFC 7231 in Chrome: Redirects have path percent-escaped, but not fragment ("title contains invalid characters" errors)
Closed, DuplicatePublic


Recently at Location headers started to be served uncorrectly (partially encoded, partially not). While some browsers have a fix for it, latest Chrome (most importantly) and some other branches of WebKit was not ready. As a result after each edit redirects lead to "type 25 pollution" of URLs and to "unacceptable symbol %D0" error page. Like this one (see parasite 25's in the URL)

The problem is reported to Chromium and they will restore the fix (their todo letter enclosed at the bottom). Still they asked to mention, that it is not a correction but a fix to fight with some servers non-standard behavior - this time.

So I'm mentioning it here. And if the headers will be corrected right now, Chrome users will not have to wait for the fix to have their work at Wikipedia restored.

The Chromium letter:

Fix handling of half-escaped Location headers.

Non-ASCII characters in URLs are supposed to be percent-escaped. This
includes when those characters are used in Location headers. However, we
tolerate unescaped characters by escaping them ourselves at various
stages in the process. sometimes sends redirects where the path is
percent-escaped, but the fragment is not. Previously, we would fix the
escaping in the fragment, but leave the path as-is. However,
switched it to always unformly escape everything, so the %s became %25.
This causes us to follow the redirect incorrectly.

Uniformly escaping things makes sense for NetLog, where we want to
unambiguously represent what the server actually sent, but escaping as
part of URL resolution as not intended to be invertible. Restore the old
behavior here and add some regression tests.

Bug: 942073
Change-Id: I009bc0dc9c5c8f836f072fe23ccd824698d550e0

Event Timeline

Aklapper renamed this task from Location header is invalid per RFC 7231: breaking Chrome browser usage to Invalid location header per RFC 7231 in Chrome: Redirects have path percent-escaped, but not fragment ("title contains invalid characters" errors).Mar 15 2019, 11:35 AM
matmarex added subscribers: MaxSem, matmarex.

Here's an example of the weirdly encoded header (sent when saving a null edit of the section onОлсен,_Джастин)


I'm not very familiar with the relevant code (I'm here because I just saw the other task being merged as a duplicate on IRC and was curious), but I know that @MaxSem worked on this feature (human-readable section IDs).

Possibly Sanitizer::escapeIdForLink() is where the URL-encoding should be added?

I'm honnestly puzzled why does it take so long to fix.

Sending Locations of the kind %D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Neolexx#Якорь (half encoded, half as-is Cyrillic) - is just plain wrong, logically and by RFC. Yes, the majority of browsers are able to fight with it (and many other things within IDN homograph attacks). But here is not a hacker site.

Chrome is a widely used browser, ru-wiki is one of biggest projects, this kind of "type 25 pollution" fix is plain ridiculous bu tonly one working so far:

So first fixing the Location headers and after starting to think what nice but not survival important things may got broken.

@Neolexx: This task is closed as a duplicate. See the other task for updates: A patch has been proposed only 36 hours ago. That's not "so long". If you'd like to get things fixed faster and want to help, feel free to amend the proposed patch to make it pass jenkins-bot verification.