Page MenuHomePhabricator

external links being rendered when they only have one slash
Closed, ResolvedPublic

Description

Author: rowan.collins

Description:
Ext. links with forms such as "http:/foo" and "ftp:/foo" (rather than
"http://foo" and "ftp://foo") are currently treated as valid by Parser.php, but
then not styled by the CSS (i.e. they show up light blue in monobook, but
receive no icon). The relevant RFCs appear to define the double-slash as
mandatory (e.g. ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt section 3.1), so
the parser should probably simply pass over these as invalid syntax.

Of course, things like "mailto:" will *never* have a double-slash, so it needs
to be a scheme-specific check. Perhaps it should be treated as part of the
scheme, so the list (configurable, per bug 431) would be something like
$wgValidURISchemes=array('http://', 'ftp://', 'mailto:') etc.


Version: unspecified
Severity: minor
URL: http://meta.wikimedia.org/wiki/Sandbox/787

Details

Reference
bz787

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 7:02 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz787.

The parser handle the "http:/foo" things correctly and even build a correct link.
This way reader will at least be able to follow the link.

The fact that the icon is missing is good way to alert the reader that something
is wrong an need some minor tweaking.

Let it as this.

rowan.collins wrote:

I'm sorry, but I disagree. I'm happy to have this marked as a low-severity,
low-priority bug, but a bug it is. The current behaviour does *not* handle such
links "correctly", by any definition of "correctness" we choose.

  • If we want to follow standards, "http:/example.com" should *not* be treated as

a valid URL, and therefore should *not* generate a clickable link; if there's no
link at all, somebody will soon spot that something needs fixing.

  • If we want to ignore the standards, and go for an easy life, we can continue

to create clickable links for such invalid URLs, and hope that browsers
generally treat them sanely. In that case, however, we need to fix the CSS to
use the same rule as the parser, so we don't get these anomalously un-labelled
links. Saying that this gives a clue that something needs fixing is, frankly,
rather silly; either the links are valid, and should be displayed properly, or
they are invalid, and should not be rendered at all.

It's not like it would even be a complex fix, especially if someone were playing
with that code anyway, for bug 431 (like I say, just make the protocols array
contain "http://", "mailto:", etc.).

So I will prefer that the parser stop rendering URL with only one / .
That will be even better to trigger the error ;o)

rowan.collins wrote:

(In reply to comment #3)

So I will prefer that the parser stop rendering URL with only one / .
That will be even better to trigger the error ;o)

I'm not sure if you have misunderstood me, or if I am now misunderstanding you,
but just to clarify: not rendering URLs with only one '/' would in fact be
correct behaviour (in accordance with the definitions of valid URLs in the
relevant RFCs). It would also make the error easier to spot than having the PHP
treat it as OK, but the CSS treat it as 'something unknown', which is perhaps
what you meant. As I say, either way, our code should treat such links
*consistently*.

avarab wrote:

I added a parsertest for this in REL1_4 and HEAD.

parser engine got a trouble cause http:foobar is atm considered
valid probably to handle the mailto:foo@b.ar case. Need to rewrite
some pieces so that the protocols are defined with the //

Eg:
define( 'URL_PROTOCOLS', 'http|https|ftp|irc|gopher|news|mailto' );
define( 'HTTP_PROTOCOLS', 'http|https' );

Should become something like:
define( 'URL_PROTOCOLS',
'http:\/\/|https:\/\/|ftp:\/\/|irc:\/\/|gopher:\/\/|news:\/\/|mailto:' );
define( 'HTTP_PROTOCOLS', 'http:\/\/|https:\/\/' );

rowan.collins wrote:

define( 'URL_PROTOCOLS',
'http:\/\/|https:\/\/|ftp:\/\/|irc:\/\/|gopher:\/\/|news:\/\/|mailto:' );

Or, as suggested in bug 431, make the whole thing configurable, as
$wgUrlProtocols or some such. BTW, do you really need to escape all those
slashes? What would be the side-effect of having the following, much more
pleasant definition?

$wgUrlProtocols='http://|https://|ftp://|irc://|gopher://|news:|mailto:';

[oh, and note that 'news:' doesn't have a '//']

rowan.collins wrote:

(In reply to comment #7)

BTW, do you really need to escape all those
slashes?

Sorry, I was being dim there - '/' would mark the end of the regex; however, PHP
uses the same regex system as Perl, so you can actually choose your own regex
delimiters. Thus my regex fragment would work, as long as any regex built from
used, say, "%regex%" instead of "/regex/".

Sorry, I was being dim there - '/' would mark the end of the regex;

That's why I keep escaping slashes. We could use another delimiter though.

http://twenkill.dyndns.org/wiki/787
http://test.leuksman.com/index.php/787

I fixed the bug. Malformed URLs are no more converted.

Fixed the fix that stopped autonumbering :o)

There is some broken links floating around which need to be fixed now.

epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:14 AM
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:20 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).