Page MenuHomePhabricator

Magic links are inconsistent with common parser rules
Closed, ResolvedPublic

Description

Author: theevilipaddress

Description:
The behavior of line breaks when it comes to magic links, like RFC or PMID, is different from the usual behavior of line breaks in wiki syntax. If there are more than two linebreaks between the lines, they will be put on different lines. However, if they are magic links, then they will still stay magic links, even with line breaks in between. That's quite unlikely the intention, and I would probably suggest to only allow regular spaces as separators of the "RFC"/"PMID" strings and the numbers. See http://test.wikipedia.org/wiki/Magic_links for some examples I've given.


Version: unspecified
Severity: minor

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:35 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz29025.
bzimport added a subscriber: Unknown Object (MLST).

Accept only spaces separating magic links to RFC/PMID/ISBN

You are right. The magic links reges use \s which is equivalent to [\t\n\f\r ].
Just using spaces would have been enough. The other characters were added in r15976, and I don't think they were intended to be supported.
The above patch limits magic links to the space character. I would like checking its actual usage before applying, since they had been accepted the last 5 years.

Attached:

Looks good to me.

Failing pages from enwiki-20110405

I have tested the 11120931 revisions of enwiki-20110405-pages-articles against the regex
'!(?: # Start cases

(?:RFC|PMID)[\t\n\f\r]+([0-9]+) |   # m[4]: RFC or PMID, capture number
ISBN[\t\n\f\r]+(\b                  # m[5]: ISBN, capture number
        (?: 97[89] [\ \-]? )?   # optional 13-digit ISBN prefix
        (?: [0-9]  [\ \-]? ){9} # 9 digits with opt. delimiters
        [0-9Xx]                 # check digit
        \b)

)!x'
748 pages would lose a magic linking.

If only 3645484 pages are articles, 748/3645484 = 0.2 per 1000.

I'd like to take a look into these articles, though.

Attached:

EN.WP.ST47 wrote:

Looks to me like the magic links on the test.wikipedia.org page are all now behaving correctly, and only those with spaces and not newlines are linked. Closing this fixed.

EN.WP.ST47 wrote:

Never mind, I completely misinterpreted that test page...

vlakoff wrote:

patch proposal for Bug 28950 and Bug 29025

First patch proposal for Bug 28950 and Bug 29025. Seems to be working great, nevertheless any suggestion would be very welcome.

The benefits of this patch are:

  • (Bug 28950) non-breaking spaces (both literal char and HTML entities) support
  • (Bug 29025) no surprising link creation if several \n's (like "ISBN\n\n1234567890")

The only limitation I am aware of is that \n isn't implemented (yet), so for example "ISBN\n1234567890" doesn't produce a link. But don't forget cases like "ISBN \n123...", "ISBN\n 123..." (<pre> insertion!), "ISBN\n&nbsp;123...", and so on.

\n support is feasible, I don't know if it would be that useful, however I'd like to be as close as possible to "normal" wikicode parsing.

Attached:

vlakoff wrote:

Please see Bug 28950 for an updated patch of mine (and future ones if any).

sumanah wrote:

Changed "reviewed" keyword to "need-review" to indicate that new patch awaits code review.

Change 133651 had a related patch set uploaded (by Cscott):
Don't allow embedded newlines in magic links, but do allow &nbsp;

https://gerrit.wikimedia.org/r/133651

Patch-For-Review

Change 133651 merged by jenkins-bot:
Don't allow embedded newlines in magic links, but do allow &nbsp;

https://gerrit.wikimedia.org/r/133651

Is this task resolved?

Izno closed this task as Resolved.Jan 12 2017, 9:45 PM
Izno added a subscriber: Izno.

Looks like it was resolved in 1.25, according to the linked Diffusion change. I'll mark it so.