Page MenuHomePhabricator

Line breaks on subtitle are ignored on TimedText namespace
Closed, ResolvedPublic

Description

Line breaks are ignored on TimedText namespace. I tried <br> and <br />, but they don't work. In addition, sentences after <br> are not displayed on the movie.

Example:
https://commons.wikimedia.org/w/index.php?title=TimedText:Knowledge_for_Everyone_(no_subtitles).webm.ja.srt&oldid=141488045

Event Timeline

Yukichi99 updated the task description. (Show Details)
Yukichi99 raised the priority of this task from to Needs Triage.
Yukichi99 changed Security from none to None.
Yukichi99 added a subscriber: Yukichi99.
Florian added a subscriber: Florian.

Please don't forget to add projects to your task :)

Florian removed a subscriber: Florian.Dec 15 2014, 6:20 AM

Hi @Yukichi99. Thanks for taking the time to report this!
This particular problem has already been reported into our bug tracking system as T55926, but please feel free to report any further issues you find. Further handling of the reported issue happens in T55926.

brion reopened this task as Open.Aug 11 2015, 9:19 PM
brion added a subscriber: brion.

This seems unrelated to T55926 which is about a specific user interface message. This is about subtitles.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptAug 11 2015, 9:19 PM
brion added a comment.Aug 11 2015, 9:23 PM

Example:

This text on https://commons.wikimedia.org/wiki/TimedText:Knowledge_for_Everyone_(no_subtitles).webm.ja.srt is explicitly broken in the source over two lines:

00:00:56,818 --> 00:01:03,528
そして、ウィキペディアへの無料のアクセスを実現する嘆願書にサインして、
私達と共に声をあげるよう、お誘いしたいと思っています

The displayed subtitle (at least as tested in Chrome) combines the lines together as one, then re-breaks them in a less convenient place:

brion added a subscriber: TheDJ.Aug 11 2015, 9:51 PM

So here's what seems to be happening:

  • TMH's TimedText plugin fetches the blah.srt page from the wiki's API with an action=parse
  • so, it receives the HTML of the .srt source text *processed as a wiki page*:
    • double-newlines in source become <p> breaks
    • single-newlines in source are removed
  • mw.TextSource processes the HTML it's received:
    • removes all elements "for security reasons"
    • processes through the now-plaintext file...
    • if there were multiple lines in a single subtitle entry, it would put a <br> between them, but this can't happen because of the processing above

@TheDJ is of the impression that this was originally using action=raw and plaintext, and it was changed to use wiki parsing to enable some formatting, but that was disabled for security reasons, so we seem to be left with an unnecessary parse operation.

Unless templates are used *in actual subtitle text* it should be safe to switch these back to action=raw, which should get the line breaks working again, simplify processing, and improve performance.

Change 231771 had a related patch set uploaded (by TheDJ):
[WIP] Do away with wikitext parsing in Timed Text

https://gerrit.wikimedia.org/r/231771

Jdforrester-WMF triaged this task as Low priority.Sep 4 2015, 6:56 PM
Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 7:04 PM
TheDJ moved this task from Backlog to TimedText on the TimedMediaHandler board.Oct 21 2015, 7:37 PM

Change 231771 abandoned by TheDJ:
[WIP] Do away with wikitext parsing in Timed Text

https://gerrit.wikimedia.org/r/231771

Hi, any update on this? I just spent a painstaking amount of time subtitling a 40-minute video, using this website called Amara, and it reccomended I limit lines to 42 characters, so I went and broke up each line accordingly, only to find that the video player doesn't preserve line breaks... Not only does it become a readability issue, this also makes some lines very confusing, because it's a convention in subtitles when multiple people are speaking to format it like this:

– First person's speech
– Second person's

and now it just displays like:

– First person's speech – Second person's

making everything jumbled. Other players like VLC do keep line breaks.

TheDJ added a comment.Jan 8 2018, 12:14 PM

Unfortunately, there is currently no one working on anything related to the video player and it's subtitles.

brion added a comment.Jan 8 2018, 5:58 PM

I'll be picking up some more of these bugs soon -- not sure if there's an easy fix on the current system, but the new frontend should be nicer to the original subtitle formatting. (However it may require manual adjustment of subtitle files that don't render correctly. I'll post some guidance when I'm closer in.)

Change 232214 had a related patch set uploaded (by Brion VIBBER; owner: Brion VIBBER):
[mediawiki/extensions/TimedMediaHandler@master] Subtitles served through API, with WebVTT conversion

https://gerrit.wikimedia.org/r/232214

Change 232214 had a related patch set uploaded (by Brion VIBBER; owner: Brion VIBBER):
[mediawiki/extensions/TimedMediaHandler@master] Subtitles served through API, with WebVTT conversion

https://gerrit.wikimedia.org/r/232214

4nn1l2 added a subscriber: 4nn1l2.Apr 27 2019, 4:05 PM

Change 232214 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Subtitles served through API, with WebVTT conversion

https://gerrit.wikimedia.org/r/232214

TheDJ closed this task as Resolved.May 19 2019, 10:31 AM
TheDJ assigned this task to brion.
TheDJ removed a project: Patch-For-Review.
4nn1l2 changed the task status from Duplicate to Resolved.