Page MenuHomePhabricator

url parsing does not recognize mixed case protocols
Closed, ResolvedPublic

Description

But it works if I type it in address bar.


Version: 1.19
Severity: normal

Details

Reference
bz34939

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:09 AM
bzimport set Reference to bz34939.
bzimport added a subscriber: Unknown Object (MLST).

Original summary:

[HttP://en.WikiPedia.org/Wiki/page_name] is not recognized as a valid external link

We have wfUrlProtocols() / mUrlProtocols to list all possible protocol. Whenever we use them, maybe we should make the regex ignore case with the i modifier.

Please note we probably want to remove all wfUrlProtocols() call from the parser excepting the one in the constructor. See bug 34956

fran wrote:

I've submitted a patch that should make all URL scheme handling case-insensitive.

https://gerrit.wikimedia.org/r/#/c/15224/

Patch merged, we should be good now! Thanks all.

It should be noted that this patch violates RFC 3986, which states that applications should accept mixed case schemes as a matter of robustness, but only output lower-case schemes in the actual output.

Tyler, maybe we should open a new bug (or bug and tracking bug) for RFC 3986 compliance? I don't pretend to know how much effort it would take, but it seems like a separate issue.

(In reply to comment #7)

Tyler, maybe we should open a new bug (or bug and tracking bug) for RFC 3986
compliance? I don't pretend to know how much effort it would take, but it seems
like a separate issue.

Yep, sounds good to me. Thanks for your work on this bug!

(In reply to comment #6)

It should be noted that this patch violates RFC 3986, which states that
applications should accept mixed case schemes as a matter of robustness, but
only output lower-case schemes in the actual output.

Kind of trippy that the RFC would specify that output should be case sensitive while also calling for every input to be case insensitive. That is, no browser is going to ever get confused by href="Http://...". Anyway, if it's a real concern, a separate bug (or separate bugs) should be filed.

(In reply to comment #6)

It should be noted that this patch violates RFC 3986, which states that
applications should accept mixed case schemes as a matter of robustness, but
only output lower-case schemes in the actual output.

Does this mean MediaWiki should show the following links in the same way?

(In reply to comment #9)

(In reply to comment #6)

It should be noted that this patch violates RFC 3986, which states that
applications should accept mixed case schemes as a matter of robustness, but
only output lower-case schemes in the actual output.

Does this mean MediaWiki should show the following links in the same way?

Yes, HTTP, HttP, HTtp, etc. should all be rendered in the canonical form of "http" in the actual links. I'll open a separate bug for this when I have time. Should be an easy fix.

Please fix the spelling error in RELEASE-NOTES-1.20:

  • (bug 34939) made link parsking insensitive ([HttP://]) ^^^^^^^^

(In reply to comment #11)

Please fix the spelling error in RELEASE-NOTES-1.20:

  • (bug 34939) made link parsking insensitive ([HttP://]) ^^^^^^^^

Already fixed on the 12th September

https://gerrit.wikimedia.org/r/#/c/23496/

All I know is that is not in RELEASE-NOTES-1.20 .
Maybe you mean it will eventually show up there.

You are right.

The problem is due to how I check what got changed.

git fetch origin
git diff master..origin/master \
    RELEASE-NOTES-* includes/installer/LocalSettingsGenerator.php|wdiff -d -3 > /tmp/mediawikiDiff$$||:
${PAGER-less} /tmp/mediawikiDiff$$

It somehow shows me old stuff.

But I don't know how to improve it.

Also solved the duplicate bug 27913