Media Viewer joins words in the caption in case of wordwrap markup 
Closed, InvalidPublic
Actions

Assigned To

Authored By

	Miss-Sophie
	Oct 27 2016, 8:11 PM

Description

Media Viewer ignores a markup in the caption of an image in a confusing way. Two separate words are joined to one nonsensical word.

Example:
https://de.wikipedia.org/wiki/Kaimane#/media/File:U49.jpg

Junger [[Krokodilkaiman]]<br />gesichtet in [[Tortuguero]] (Costa Rica)

becomes:

Junger Krokodilkaimangesichtet in Tortuguero (Costa Rica)

with "Krokodilkaimangesichtet" being a nonsensical word.

I suggest, it should say:

Junger Krokodilkaiman — gesichtet in Tortuguero (Costa Rica)

with a dash and two spaces as the output, where there is a in the source code for the caption of an image.

Event Timeline

Miss-Sophie created this task.Oct 27 2016, 8:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 27 2016, 8:11 PM

Tgr added a project: good first task.Oct 27 2016, 8:18 PM

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptOct 27 2016, 8:18 PM

ovasileva triaged this task as Medium priority.Nov 1 2016, 2:40 PM

ovasileva added a project: Web-Team-Backlog.

ovasileva moved this task from Incoming to 2014-15 Q4 on the Web-Team-Backlog board.Nov 2 2016, 7:47 PM

I'm interested in pursuing this task. However, I was inspecting the link given and the code, and I can't figure out where to start. Can anyone give me a general direction of where to start?
Thanks,
MtDu

I just found an example, where a dash would not look good, because the part after the wordwrap starts with a left parenthesis, see:

https://de.wikipedia.org/wiki/Blaubart#/media/File:Barbebleue4.jpg

Blaubarts Tod<small><br />(''Les Contes de Perrault, dessins par Gustave Doré'')</small>

It looks fine in Media Viewer in this case. Media Viewer apparently adds a space after the wordwrap before the parenthesis, so there are no nonsensical joined characters here (there is no space in the wikicode, but I see one in Media Viewer). An added dash would look strange and unneccessary.

So it seems important to distinguish the cases, where the occurs. Something like: If there follows a left parenthesis, don't add a dash and two spaces but just put out a space.

I don't know, if there are other cases, where the dash wouldn't fit?

A space is a safe option, a dash is not (also highly language-specific).

MediaViewer handles this in appendWhitespaceToBlockElements in mmw.HtmlUtils.js which is presumably not called for the title line.

@Tgr
I have spent some time in the console and grepping to see which calls which function, and I have traced it back to this function, processThumbs. The alt and caption are already the wrong ones there, but it seems I can't trace it back to anything else. Could you let me know what the next step is, or if I'm going in the wrong direction? https://dpaste.de/t5Jx#L8,9,10,11,12,13,14
Thanks,
MtDu

In T149361#2773748, @MtDu wrote:

I have traced it back to this function, processThumbs. The alt and caption are already the wrong ones there, but it seems I can't trace it back to anything else.

You are looking at the right function, but the highlighted lines just look for the <img> tags. The associated captions are extracted afterwards.

@Tgr,
Ok. When I looked for the alt tags, I think they're extracted from here. https://dpaste.de/bKto#L14 However, when they are extracted, it seems the alt is already containing the wrong text when I inspect the element. When I look at the innerText in the $content, I see a return arrow between the desired [[Krokodilkaiman]] gesichtet, which may symbolize a tag, but I don't know if how that is useful. Let me know if I have the right function and what I should proceed in figuring out/doing.
Thanks,
MtDu

Oh, sorry, did not realize the text comes from an alt tag. Nothing to fix here then, it's a wikitext error (or arguably an error in the wikitext parser). You can verify in the HTML source of the page that the words are already joined there.

Media Viewer joins words in the caption in case of wordwrap markup <br />Closed, InvalidPublicActions

Description

Event Timeline

Media Viewer joins words in the caption in case of wordwrap markup <br />
Closed, InvalidPublic
Actions