Page MenuHomePhabricator

TimedText markup for bold, italic etc not parsed and displayed as plain text
Closed, ResolvedPublic

Description

In the past, subtitles could have markup applied to them to allow for boldface, italics, size changes and the like, but now it no longer works and the markup displays in plain text. I started noticing this recently, like a couple of weeks now. This is problematic as many timedtexts have this markup applied to them, as they worked at the time they were applied, but now they no longer do.

Steps to reproduce:

1.) Go to this here link: https://commons.wikimedia.org/w/index.php?title=File%3APatrioticheskaya_Pesnya_(May_2000).oga
2.) Play the OGG file on the page using the media player.
3.) Note the timed text at around the 0:50 second mark not displaying markup correctly. It should be displaying bold text, italic text, large text, etc.

This bug occurs when not logged in or logged in, on Microsoft Edge and Google Chrome (in Incognito mode as well). The failure to successfully parse the markup happens across the various Wikimedia projects, from Wikipedia to Wikimedia Commons. It also happens regardless of where the timedtext is stored, whether locally on say Wikipedia or over on the Wikimedia Commons.

This may be related to: T224367

Event Timeline

Hi @Laqueesha, thanks for taking the time to report this and welcome to Wikimedia Phabricator!
Please see https://www.mediawiki.org/wiki/How_to_report_a_bug for required info and provide a public and specific testcase link where the problem can be seen.

This comment was removed by Laqueesha.
Aklapper renamed this task from TimedText markup not working to TimedText markup for bold, italic etc not parsed and displayed as plain text.May 24 2019, 7:13 AM
Laqueesha updated the task description. (Show Details)

We are switching to WebVTT (HTML5) and as such are dropping markup for SRT subtitles. For that reason, there will be a period where this will not be possible.

also this file seems to use wiki markup for bolding, which isn't really supported to begin with. Please use SRT or WebVTT marking (<b> for bold, <i> for italics etc).

API serves up:

8
00:00:50,000 --> 00:01:00,000
&lt;big&gt;'''''"Be glorious, Russia. My Motherland!"'''''&lt;/big&gt;

Corrected english subs to:

8
00:00:50,000 --> 00:01:00,000
<b><i>"Be glorious, Russia. My Motherland!"</i></b>

I don't think the SRT renderer of the old player rly handles either honestly.. (not sure why, i think it did before...)

Ah due to the changes, we switched from getCaptionsFromMediaWikiSrt to getCaptionsFromSrt. The first just 'takes' some html processing, the latter strips all html for security reasons.

Okay so what is the solution? Is it that we cannot have wiki links or markup in captions anymore? Or is there a work around / changes that simple need to be made for this to work again?

Well, we definitely can't have wiki links anymore, as that is not part of any webstandard that we intend to follow (or even exists as a matter of fact). The mark up is a different thing. You cannot use wikicode anymore, only official SRT or VTT markup. And the SRT markup specifically is currently caught between a rock and a hardplace as we transition between the two technologies. I welcome more contributors. I mean i've only been trying to make progress on this for 5 years now... 5 years to replace a video player.

Okay thanks TheDJ. So we can use CTT markup? And so will need a Wikimedia Markup to CTT markup converter than?

The core now validates all SRT and VTT subtitles, due to the great work of @brion It's just the old player doesn't understand proper SRT with markup (only our old warped version of it).
The new player should support VTT with markup if your browser supports VTT with mark-up, and our core will automatically convert the SRT to VTT if needed. But the new player isn't live yet.

@brion, is all SRT content validated now before being served by the API? I believe so right ? That might make it safe to strip the client side escaping from the old player perhaps ?

@TheDJ yes, it's now validated (and in many cases slightly manipulated to become conforming) before being served out. As long as the formatting uses <b>...</b> and <i>...</i> rather than wiki-style '''...''' or ''...'' it will work with the upcoming videojs player -- but the old Kaltura player we're still shipping doesn't understand those.

It was previously using a weird hack with wiki parsing that made <b> and <i> work but was pretty .... funky, and broke other things like natural line breaks.

I've got a provisional patch for T222763 which was the main blocker for starting to push out the new player in testing, so we should be able to start running it soon.

Okay so what is the solution? Is it that we cannot have wiki links or markup in captions anymore? Or is there a work around / changes that simple need to be made for this to work again?

I came here looking for this. We are working on some pedagogical videos, and it would be great if subtitles could have links/wikilinks to articles, so the experience is richer. Is there any way this can be accomplished or are we in a death lane?

Thanks!

@Theklan: Steps to reproduce a problem would help. Where and how exactly to see a bug, using which media player in which browser? Example in the task description works as expected for me here, using Firefox 97:

Screenshot from 2022-02-15 20-45-25.png (423×600 px, 22 KB)

@Theklan there are no plans for adding links back in subtitles. If anyone wants to write a custom annotation library and setup the infra to support those as separate annotation tracks etc, etc, you can do a lot, but we are not going to shove them back into audio captioning, it's just not what that was made for.

I created T301826: Video track annotations for Wikimedia for eventual proper annotation/clickable area support.

Aklapper triaged this task as Lowest priority.Feb 15 2022, 9:00 PM
Aklapper added a project: Kaltura player.

Setting lowest priority as this only happens with the old player.

@Theklan: Steps to reproduce a problem would help.

Yes, sorry @Aklapper . I was asking about the link feature, not about bold and italic. But @TheDJ created T301826, so I will follow there.

Jdforrester-WMF added a subscriber: Jdforrester-WMF.

This indeed seems to have been resolved with the move to videojs.