Page MenuHomePhabricator

Strip markers exposed in anchor links with ==[[Link|<nowiki>text</nowiki>]]==
Closed, ResolvedPublic

Description

Author: smile

Description:

  1. Page sections containing UNIQ key can no longer be referenced directly by permanent/external links.
  1. "API prop=sections" fails in section-titles containing UNIQ key, for example

<s toclevel="1" level="2" line="UNIQ28cf615b3f10792f-nowiki-0000000B-QINU" number="39" />

where section-title-text behind the UNIQ key is lost:

http://de.wikipedia.org/w/api.php?action=parse&page=Wikipedia:Fragen_zur_Wikipedia/Archiv/2009/Woche_10&prop=sections


Version: 1.15.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=25417

Details

Reference
bz18295

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:32 PM
bzimport set Reference to bz18295.
bzimport added a subscriber: Unknown Object (MLST).

herd wrote:

UNIQ strings are not a feature, they are a consequence of malformed parsing (basically a big flashing sign that says "BAD WIKICODE HERE". In this case it seems no valid anchor string can be formed from [[Link|<nowiki>text</nowiki>]]. Changing topic appropriately. This might be a dupe or WONTFIX though.

smile wrote:

(In reply to comment #1) OK, thanks. What puzzles me is, why do UNIQ strings have to change at all and/or so frequently? And even when a wiki-page doesn't change? That's a real drag in archived talk-pages, where "nowiki" is used quite frequently in section-titles.

Proposed fix: Swap the order of tag and link unstripping in Parser::formatHeadings()

Easy fix: Swap the order of link unstripping and extension tag unstripping in Parser::formatHeadings().
It is IMHO far more likely that a link contains an extension tag (like above), while offhand I can't think of a case, where an extension tag would contain an half-parsed link at that stage of parsing.

(In reply to comment #2)

(In reply to comment #1) OK, thanks. What puzzles me is, why do UNIQ strings
have to change at all and/or so frequently? And even when a wiki-page doesn't
change? That's a real drag in archived talk-pages, where "nowiki" is used quite
frequently in section-titles.

The whole point of UNIQ keys is that they are not predictable, so you can't break parsing by inserting some of them to a page.

Attached:

related: bug 25417 (<ref> used also strip markers inside wikilink)

(In reply to comment #3)

Created attachment 5983 [details]
Proposed fix: Swap the order of tag and link unstripping in
Parser::formatHeadings()

Paul, could you apply the patch now that you have commit privs?

Attached:

Looks like this is already fixed from my testing on wikipedia.

(In reply to comment #6)

Looks like this is already fixed from my testing on wikipedia.

It's not. Try putting

[[Link|<nowiki />]]

in a page and look for <span class="mw-headline" id=".7FUNIQ...QINU.7F">.

However, I don't think this can be fixed at the moment, as Brion points out at bug 93 comment 30 that the parser is currently being frozen and rewritten.

(In reply to comment #7)

However, I don't think this can be fixed at the moment, as Brion points out at
bug 93 comment 30 that the parser is currently being frozen and rewritten.

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/54555

Looks like this is fixed. Tested with 1.21wmf8.