Page MenuHomePhabricator

Media Viewer joins words in the caption in case of wordwrap markup <br />
Closed, InvalidPublic

Description

Media Viewer ignores a <br /> markup in the caption of an image in a confusing way. Two separate words are joined to one nonsensical word.

Example:
https://de.wikipedia.org/wiki/Kaimane#/media/File:U49.jpg

Junger [[Krokodilkaiman]]<br />gesichtet in [[Tortuguero]] (Costa Rica)

becomes:

Junger Krokodilkaimangesichtet in Tortuguero (Costa Rica)

with "Krokodilkaimangesichtet" being a nonsensical word.

I suggest, it should say:

Junger Krokodilkaiman — gesichtet in Tortuguero (Costa Rica)

with a dash and two spaces as the output, where there is a <br /> in the source code for the caption of an image.

Event Timeline

ovasileva triaged this task as Medium priority.Nov 1 2016, 2:40 PM
ovasileva added a project: Web-Team-Backlog.

I'm interested in pursuing this task. However, I was inspecting the link given and the code, and I can't figure out where to start. Can anyone give me a general direction of where to start?
Thanks,
MtDu

I just found an example, where a dash would not look good, because the part after the wordwrap starts with a left parenthesis, see:

https://de.wikipedia.org/wiki/Blaubart#/media/File:Barbebleue4.jpg

Blaubarts Tod<small><br />(''Les Contes de Perrault, dessins par Gustave Doré'')</small>

It looks fine in Media Viewer in this case. Media Viewer apparently adds a space after the wordwrap before the parenthesis, so there are no nonsensical joined characters here (there is no space in the wikicode, but I see one in Media Viewer). An added dash would look strange and unneccessary.

So it seems important to distinguish the cases, where the <br /> occurs. Something like: If there follows a left parenthesis, don't add a dash and two spaces but just put out a space.

I don't know, if there are other cases, where the dash wouldn't fit?

A space is a safe option, a dash is not (also highly language-specific).

MediaViewer handles this in appendWhitespaceToBlockElements in mmw.HtmlUtils.js which is presumably not called for the title line.

@Tgr
I have spent some time in the console and grepping to see which calls which function, and I have traced it back to this function, processThumbs. The alt and caption are already the wrong ones there, but it seems I can't trace it back to anything else. Could you let me know what the next step is, or if I'm going in the wrong direction? https://dpaste.de/t5Jx#L8,9,10,11,12,13,14
Thanks,
MtDu

I have traced it back to this function, processThumbs. The alt and caption are already the wrong ones there, but it seems I can't trace it back to anything else.

You are looking at the right function, but the highlighted lines just look for the <img> tags. The associated captions are extracted afterwards.

@Tgr,
Ok. When I looked for the alt tags, I think they're extracted from here. https://dpaste.de/bKto#L14 However, when they are extracted, it seems the alt is already containing the wrong text when I inspect the element. When I look at the innerText in the $content, I see a return arrow between the desired [[Krokodilkaiman]]<br />gesichtet, which may symbolize a <br/> tag, but I don't know if how that is useful. Let me know if I have the right function and what I should proceed in figuring out/doing.
Thanks,
MtDu

Tgr assigned this task to MtDu.

Oh, sorry, did not realize the text comes from an alt tag. Nothing to fix here then, it's a wikitext error (or arguably an error in the wikitext parser). You can verify in the HTML source of the page that the words are already joined there.