Page MenuHomePhabricator

bad displaying of a link followed by <nowiki> (seems related to linktrail)
Closed, DeclinedPublic

Description

Author: pablo

Description:
In the wa: wikipedia I changed the linktrail to also
include uppercase A-Z, so that the trail can have upercase too, eg it is:

'linktrail' => '/^([a-zåâêîôûçéèA-ZÅÂÊÎÔÛÇÉÈ]+)(.*)$/sD',

since then, I saw a very strange behaviour,
try the following wikicode on a page in the wa:
wikipedia (just "preview" will do):


*[[aa]]<nowiki>bb</nowiki>
*[[aa]]bb
*[[aa]]<nowiki>BB</nowiki>
*[[aa]]BB
*[[aa]]<nowiki>êê</nowiki>
*[[aa]]êê
*[[aa]]<nowiki>ÊÊ</nowiki>

*[[aa]]ÊÊ

as you can see on the lines without the nowiki part
the trail is properly included in the clickable link,
both lower and upercase, and accented letters as well.
However, the lines with nowiki render very strangely,
instead of "aabb" with "aa" clickable and "bb" not,
it renders as
"aaNaodW29-nowiki49a5b674f36af500000001" with
"aaNaodW" clickable and the rest not.
note that if I put a character not in the linktrail between the "]" and the "<"
then it displays fine (but if the character is in Latin extension set A it
displays wrongly, actually it dispalys as 2 U+FFFD, one linked to the clickable
string, and one not...), see:


*[[aa]]x<nowiki>bb</nowiki> (ascii char, displays ok)
*[[aa]]ə<nowiki>bb</nowiki> (above U+00FF, displays ok)
*[[aa]]ö<nowiki>BB</nowiki> (between U+007F and U+00FF, displays wrong for that

char)

the problem with letters in U+007F-U+00FF don't
depend on the nowiki, so maybe there are two problems, see:


*[[aa]]öbb

Now, go to fr: wikipedia and test the same set of wikicode samples; the
rendering is ok, but trail with uppercase unaccented letters is ignored (trail
with uppercase accented letters is recognized); the linktrail in fr: is the
following:

'/^([a-zàâçéèêîôûäëïöüùÇÉÂÊÎÔÛÄËÏÖÜÀÈÙ]+)(.*)$/sD'

the main difference with the wa: linktrail version
is that I added tha "A-Z" as it doesn't make sense
to have upercase accented chars but not the unaccented letters (I also removed
several accented
letters unused in walloon and added "åÅ"):

'/^([a-zåâêîôûçéèA-ZÅÂÊÎÔÛÇÉÈ]+)(.*)$/sD'

could the "A-Z" be the reason of the rendering problem?
but I don't understand how it can be.

there is clearly a parsing problem just after the closing "]" when a nowiki or a
char in latin1 set
is involved


Version: 1.6.x
Severity: normal

Details

Reference
bz4119

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:58 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz4119.
bzimport added a subscriber: Unknown Object (MLST).

[[Foobar]]<pre>toto</pre> trigger the error too.

The linktrail match 'NaodW', that disabled our parsing system :p

One way is to change the NaodW thing or make it so the linktrail
regex doesnt match "NaodW29".

gangleri wrote:

Hallo!

It seams that the (this?) error is triggered (also?) by the *first* character in
linktrail:

[[Glavna stranica]]{{Predložak:diacritics/hr}}
at [[wikt:hr:user:Gangleri/tests/bugzilla/04119/001]]
*and*
[[Glavna stranica]]{{SUBST:Predložak:diacritics/hr}}
generating
[[Glavna stranica]]ČčĆćDždžĐ𩹮ž
at [[wikt:hr:user:Gangleri/tests/bugzilla/04119/001]]

generates
Glavna stranica��čĆćDždžĐ𩹮ž
"Glavna stranica�" is blue and "�čĆćDždžĐ𩹮ž" is black.

The links is OK: [[wikt:hr:Glavna stranica]].

best regards reinhardt [[user:gangleri]]

The unicode bits are due to the regex being bogus (missing the /u option, so it's
byte-oriented), which has since been fixed.

As for the nowiki bit; for now I'm just removing the capitals since they're not
allowed in linktrail anywhere else.

I'm resolving this as LATER, in case someone wants to try harder to "fix" this
properly, but it won't be an issue as long as capitals stay out of the linktrail as
intended.