Page MenuHomePhabricator

TimedText markup for bold, italic etc not parsed and displayed as plain text
Open, Needs TriagePublic

Description

In the past, subtitles could have markup applied to them to allow for boldface, italics, size changes and the like, but now it no longer works and the markup displays in plain text. I started noticing this recently, like a couple of weeks now. This is problematic as many timedtexts have this markup applied to them, as they worked at the time they were applied, but now they no longer do.

Steps to reproduce:

1.) Go to this here link: https://commons.wikimedia.org/w/index.php?title=File%3APatrioticheskaya_Pesnya_(May_2000).oga
2.) Play the OGG file on the page using the media player.
3.) Note the timed text at around the 0:50 second mark not displaying markup correctly. It should be displaying bold text, italic text, large text, etc.

This bug occurs when not logged in or logged in, on Microsoft Edge and Google Chrome (in Incognito mode as well). The failure to successfully parse the markup happens across the various Wikimedia projects, from Wikipedia to Wikimedia Commons. It also happens regardless of where the timedtext is stored, whether locally on say Wikipedia or over on the Wikimedia Commons.

This may be related to: T224367

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 24 2019, 3:04 AM
4nn1l2 added a subscriber: 4nn1l2.May 24 2019, 5:59 AM

Hi @Laqueesha, thanks for taking the time to report this and welcome to Wikimedia Phabricator!
Please see https://www.mediawiki.org/wiki/How_to_report_a_bug for required info and provide a public and specific testcase link where the problem can be seen.

This comment was removed by Laqueesha.
Aklapper renamed this task from TimedText markup not working to TimedText markup for bold, italic etc not parsed and displayed as plain text.May 24 2019, 7:13 AM
Laqueesha updated the task description. (Show Details)May 24 2019, 10:42 AM
Laqueesha updated the task description. (Show Details)May 24 2019, 11:11 PM
Laqueesha updated the task description. (Show Details)May 26 2019, 11:54 AM
Laqueesha updated the task description. (Show Details)May 26 2019, 12:02 PM
Laqueesha updated the task description. (Show Details)May 26 2019, 7:10 PM
Laqueesha updated the task description. (Show Details)
Laqueesha updated the task description. (Show Details)May 26 2019, 7:29 PM
Laqueesha updated the task description. (Show Details)May 28 2019, 10:57 PM
Laqueesha updated the task description. (Show Details)May 29 2019, 9:04 AM
TheDJ added a subscriber: TheDJ.May 29 2019, 9:53 AM

We are switching to WebVTT (HTML5) and as such are dropping markup for SRT subtitles. For that reason, there will be a period where this will not be possible.

also this file seems to use wiki markup for bolding, which isn't really supported to begin with. Please use SRT or WebVTT marking (<b> for bold, <i> for italics etc).

TheDJ added a comment.EditedMay 29 2019, 9:58 AM

API serves up:

8
00:00:50,000 --> 00:01:00,000
&lt;big&gt;'''''"Be glorious, Russia. My Motherland!"'''''&lt;/big&gt;

Corrected english subs to:

8
00:00:50,000 --> 00:01:00,000
<b><i>"Be glorious, Russia. My Motherland!"</i></b>

I don't think the SRT renderer of the old player rly handles either honestly.. (not sure why, i think it did before...)

TheDJ added a comment.EditedMay 29 2019, 10:04 AM

Ah due to the changes, we switched from getCaptionsFromMediaWikiSrt to getCaptionsFromSrt. The first just 'takes' some html processing, the latter strips all html for security reasons.

Laqueesha updated the task description. (Show Details)May 30 2019, 9:56 AM
Laqueesha updated the task description. (Show Details)May 30 2019, 4:00 PM
Laqueesha updated the task description. (Show Details)May 31 2019, 8:26 AM

Okay so what is the solution? Is it that we cannot have wiki links or markup in captions anymore? Or is there a work around / changes that simple need to be made for this to work again?

TheDJ added a comment.Jun 11 2019, 9:21 PM

Well, we definitely can't have wiki links anymore, as that is not part of any webstandard that we intend to follow (or even exists as a matter of fact). The mark up is a different thing. You cannot use wikicode anymore, only official SRT or VTT markup. And the SRT markup specifically is currently caught between a rock and a hardplace as we transition between the two technologies. I welcome more contributors. I mean i've only been trying to make progress on this for 5 years now... 5 years to replace a video player.

Okay thanks TheDJ. So we can use CTT markup? And so will need a Wikimedia Markup to CTT markup converter than?

TheDJ added a subscriber: brion.Jun 12 2019, 7:51 AM

The core now validates all SRT and VTT subtitles, due to the great work of @brion It's just the old player doesn't understand proper SRT with markup (only our old warped version of it).
The new player should support VTT with markup if your browser supports VTT with mark-up, and our core will automatically convert the SRT to VTT if needed. But the new player isn't live yet.

@brion, is all SRT content validated now before being served by the API? I believe so right ? That might make it safe to strip the client side escaping from the old player perhaps ?

brion added a comment.Jun 12 2019, 3:13 PM

@TheDJ yes, it's now validated (and in many cases slightly manipulated to become conforming) before being served out. As long as the formatting uses <b>...</b> and <i>...</i> rather than wiki-style '''...''' or ''...'' it will work with the upcoming videojs player -- but the old Kaltura player we're still shipping doesn't understand those.

It was previously using a weird hack with wiki parsing that made <b> and <i> work but was pretty .... funky, and broke other things like natural line breaks.

I've got a provisional patch for T222763 which was the main blocker for starting to push out the new player in testing, so we should be able to start running it soon.