Page MenuHomePhabricator

Reference with numbered list triggers Tidy to corrupt reference list
Closed, ResolvedPublic

Description

When a reference has a numbered list inside, it depends on
wether or not it is terminated with a newline, wether or
not the references follwing it are correctly displayed.
See page:
http://meta.wikimedia.org/w/index.php?title=User:Purodha/bug_20050
for a sample. The last two references having numbered indents,
and show the difference.


Version: unspecified
Severity: enhancement
URL: http://meta.wikimedia.org/w/index.php?title=User:Purodha/bug_20050

Details

Reference
bz20050

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:49 PM
bzimport added a project: Cite.
bzimport set Reference to bz20050.
bzimport added a subscriber: Unknown Object (MLST).

Sorry, unclear wording. Correct is:

References C and D are rendered incorrectly.
Reference B appears to cause this.
Reference A has an extra newline at its end, which Reference B hasn't.

Reference A does not influence how references following it are rendered.

Evereything else is demonstrating that, only rumbered lists are
influential.

http://en.wikipedia.org/wiki/Help:Cite_errors/Testcases7

Nesting an ordered list inside another ordered list. Not terminating with a newline causes the rest of the list to be wrapped in an unordered list. I would not be surprised if this was an interaction with HTML Tidy.

You are right. The <ul> is caused by Tidy. MediaWiki outputs </li></ol> instead of </ol></li>, and Tidy tries to fix it by opening a <ul>.

happy.melon.wiki wrote:

Confirmed as a(nother) bad interaction with Tidy; unTidy'd output is correct.

Guess we can close this and track it at Bug 2542 as a Tidy issue.

Confirmed as a(nother) bad interaction with Tidy; unTidy'd output is correct.

It is not correct, or Tidy wouldn't need to fix us.

This has little to do with Cite. It's mostly the parser being simplistic combined with Tidy making a bad "fix".

As far as Cite is concerned, the ref with the embedded list gets expanded to wikitext something like this:

<LI id="cite_note-0">[[#cite_ref-0|↑]] This is a reference with a numbered list
# line 1
# line 2
# line 3— no newline</LI>

I've capitalized the <li></li> tags to make the next step clearer.

That gets handed to the parser, which doesn't pay any attention to the HTML. doBlockLevels just blindly converts the wiki-list by inserting "<ol><li>" in place of the "#" for the first element, "</li><li>" in place of the "#" for all subsequent elements, and "</li></ol>" after the last list element. The </LI> from Cite winds up inside the last list element:

<LI id="cite_note-0"><a href="#cite_ref-0">↑</a> This is a reference with a numbered list
<ol><li> line 1
</li><li> line 2
</li><li> line 3— no newline</LI>
</li></ol>

Tidy apparently decides that the last </li> is closing the <LI> and inserts a "missing" </ol>, and then the real </ol> is interpreted as closing the <reference>'s list. And then the <li> for the next reference gets a "missing" <ul> inserted before it. A browser given this tag soup might do the same, or it might decide that last </li> is extraneous and then insert the missing </li> when it sees the "<li>" for the next ref.

The reason it works with the newline is because the </LI> is then after the list instead of inside the last element, so when the parser inserts its "</li></ol>" it winds up before the </LI> and everything is correct.

We ''could'' work around this in Cite by changing [[MediaWiki:cite_references_link_one]] and [[MediaWiki:cite_references_link_many]] to have a newline before the </li>, or we could insert a newline at the end of the string passed as $3 to these messages.

Yes, but better is to fix this in cite.php. If a linebreak is all that is needed, I'll see if I can come up with a patch.

PHP is a little out of my league, but somewhere in function referencesFormatEntry in Cite_body.php, a \n needs to be inserted.

Created attachment 9420
Force "\n" at the end of each reference's text

(In reply to comment #11)

PHP is a little out of my league, but somewhere in function
referencesFormatEntry in Cite_body.php, a \n needs to be inserted.

Several places, actually. The attached patch is slightly more complex, in that it forces exactly one newline at the end of the reference body whether that ref is

<ref>foo</ref>

or something ridiculous like

<ref>foo



</ref>

Attached:

sumanah wrote:

Added "patch" and "need-review" keywords to signal that this patch awaits code review. Thanks for the patch, Brad.

Thanks for the patch. I tested it with the reporter's test case mentioned in comment 0 and it worked fine. Applied in r102970.

Confirmed: fixed; deployed in 1.20