Page MenuHomePhabricator

Reference inside file caption isn't detected
Closed, ResolvedPublic

Description

When a file contains a <ref> inside its caption, the <ref> isn't counted or detected, and it doesn't appear in <references>. This can be seen at the start of the section "Beginning of the stampede" in http://parsoid-lb.eqiad.wikimedia.org/enwiki/Klondike_Gold_Rush?oldid=651957368.

It can also be tested on the command line:

echo "[[File:foo.jpg|A <ref>ref one</ref>]]" | node parse
<p data-parsoid='{"dsr":[0,37,0,0]}'><span class="mw-default-size" typeof="mw:Image" data-parsoid='{"optList":[{"ck":"caption","ak":"A &lt;ref>ref one&lt;/ref>"}],"dsr":[0,37,null,null]}' data-mw='{"caption":"A &lt;ref>ref one&lt;/ref>"}'><a href="./File:Foo.jpg" data-parsoid='{"a":{"href":"./File:Foo.jpg"},"sa":{}}'><img resource="./File:Foo.jpg" src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" height="197" width="300" data-parsoid='{"a":{"resource":"./File:Foo.jpg","height":"197","width":"300"},"sa":{"resource":"File:foo.jpg"}}'/></a></span></p>

Event Timeline

marcoil raised the priority of this task from to Needs Triage.
marcoil updated the task description. (Show Details)
marcoil added a project: Parsoid.
marcoil added subscribers: marcoil, Kelson.
ssastry triaged this task as Medium priority.Mar 24 2015, 4:37 PM
ssastry set Security to None.
ssastry moved this task from Needs Triage to In Progress on the Parsoid board.

This seems specific to captions of inline images.

[subbu@earth lib] echo '[[File:foo.jpg|thumb|A <ref>ref one</ref>]]' | node parse --normalize
<figure><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="144" width="220"/></a><figcaption>A <span id="cite_ref-1"><a href="#cite_note-1">[1]</a></span></figcaption></figure>
<ol>
<li id="cite_note-1"><span><a href="#cite_ref-1">↑</a></span> <span>ref one</span></li>
</ol>

[subbu@earth lib] echo '[[File:foo.jpg|A <ref>ref one</ref>]]' | node parse --normalize

<p><span><a href="File:Foo.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="197" width="300"/></a></span></p>

Not entirely fixed with the patch for T50958 -- the reference is now parsed, but the ref marker gets buried inside the data-mw property:

echo '[[File:foo.jpg|A <ref>ref one</ref>]]' | tests/parse.js --normalize=parsoid
<p><span class="mw-default-size" typeof="mw:Image" data-mw='{"caption":"A &lt;meta typeof=\"mw:Extension/ref/Marker\" about=\"#mwt2\" data-parsoid=\"{&amp;quot;group&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;name&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;content&amp;quot;:&amp;quot;ref one&amp;quot;,&amp;quot;hasRefInRef&amp;quot;:false}\">"}'>
 <a href="File:Foo.jpg">
  <img resource="./File:Foo.jpg" src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="197" width="300"/>
 </a>
</span>
</p>

The Cite extension needs to be taught to look inside hidden captions, maybe.

These are actually 2 separate bugs. The failure on

echo "[[File:Klondikemanual97.jpg|thumb|upright|alt=Cover of a gold seekers manual, published 1897|Prospector's manual, 1897{{sfn|Bramble|1897|p=cover}}]]" | node parse

is not the same reason as the failure on

echo '[[File:foo.jpg|A <ref>ref one</ref>]]' | node parse

Separate patches coming.

Change 231740 had a related patch set uploaded (by Subramanya Sastry):
T93580: Fix buggy regexp in strip meta tags DOM pass

https://gerrit.wikimedia.org/r/231740

Change 231740 merged by jenkins-bot:
T93580: Fix buggy regexp in strip meta tags DOM pass

https://gerrit.wikimedia.org/r/231740

The bug seems to still impact the article (although new versions of Parsoid were deployed meanwhile): http://parsoid-lb.eqiad.wikimedia.org/enwiki/Klondike_Gold_Rush?oldid=651957368

The bug seems to still impact the article (although new versions of Parsoid were deployed meanwhile): http://parsoid-lb.eqiad.wikimedia.org/enwiki/Klondike_Gold_Rush?oldid=651957368

The <ref>-inside-image piece seems to be fixed. But, there seem to be other unrelated issues with how the <ref>s are processed -- right now, I see inline <references> section for images being rendered which is broken. Will open a separate bug report for it.

The bug seems to still impact the article (although new versions of Parsoid were deployed meanwhile): http://parsoid-lb.eqiad.wikimedia.org/enwiki/Klondike_Gold_Rush?oldid=651957368

This is basically T110909 and will be fixed when T110910 is resolved.

The fix above is still incomplete. See below. We still have a crasher.

[subbu@earth api] echo '[[File:foo.jpg|thumb|A <ref>{{echo|a}}</ref>]]' | node parse --wt2wt
[[File:foo.jpg|thumb|A <ref>{{echo|a}}</ref>]]

<references />
$ echo '[[File:foo.jpg|300px|A <ref>{{echo|a}}</ref>]]' | node parse --wt2wt
[error][enwiki/Main Page] no data-mw name for extension in:  <meta typeof="mw:Extension/ref/Marker" ...
...
Stack:
  Object.handle (/home/subbu/work/wmf/parsoid/lib/mediawiki.WikitextSerializer.js:713:17)
  WikitextSerializer.WSP._serializeNode (/home/subbu/work/wmf/parsoid/lib/mediawiki.WikitextSerializer.js:1097:30)
...

The fix above is still incomplete. See below. We still have a crasher.

[subbu@earth api] echo '[[File:foo.jpg|thumb|A <ref>{{echo|a}}</ref>]]' | node parse --wt2wt
[[File:foo.jpg|thumb|A <ref>{{echo|a}}</ref>]]

<references />
$ echo '[[File:foo.jpg|300px|A <ref>{{echo|a}}</ref>]]' | node parse --wt2wt
[error][enwiki/Main Page] no data-mw name for extension in:  <meta typeof="mw:Extension/ref/Marker" ...
...
Stack:
  Object.handle (/home/subbu/work/wmf/parsoid/lib/mediawiki.WikitextSerializer.js:713:17)
  WikitextSerializer.WSP._serializeNode (/home/subbu/work/wmf/parsoid/lib/mediawiki.WikitextSerializer.js:1097:30)
...

Ah, this is the inline image scenario as in T93580#1542130 where the image is hidden in data-mw attribute and doesn't get processed. Block image scenarios work properly due to the earlier fixes.

Change 235772 had a related patch set uploaded (by Subramanya Sastry):
T93580: Handle <refs> in inline image captions

https://gerrit.wikimedia.org/r/235772

Change 235772 merged by jenkins-bot:
T93580: Handle <ref>s in inline image captions

https://gerrit.wikimedia.org/r/235772

Besides the <gallery> extension issue I mentioned in T93580#1589747, the rest of the issues are now resolved. New code will be deployed Mon / Wednesday.

Even after the deployment of these last patches, we still have a "cite error" on the page:
http://rest.wikimedia.org/en.wikipedia.org/v1/page/html/Klondike_Gold_Rush

The bug seems not to be fixed 100%

Even after the deployment of these last patches, we still have a "cite error" on the page:
http://rest.wikimedia.org/en.wikipedia.org/v1/page/html/Klondike_Gold_Rush

The bug seems not to be fixed 100%

Please see T93580#1589747 above. This is no longer about references inside a file caption, but about references insides extensions like gallery that process wikitext which is T110910.