Page MenuHomePhabricator

Use language code subpages for subtitles to allow Translate extension usage
Open, LowestPublic

Description

Is there any reason to be forced to use a fake .srt extension? (Seems a bug.)
If we had language code subpages, we could use the translate extension right now: subpages are always clean enough if there's no <language/> or other similar stuff, see the text on https://meta.wikimedia.org/w/index.php?title=Fundraising_2012/Translation/Poongothai_video_%28captions%29/de&action=edit

Note that people do expect subtitles to work with our translation tools, see e.g. https://lwn.net/Articles/527081/

Details

Reference
bz42790

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 12:53 AM
bzimport set Reference to bz42790.

mdale wrote:

We use .srt because that is the format of the timed text. We could imagine future timed text formats like popcorn, or vtt or any other timed text stuff we may want to use in the future. Also the languageCode.srt naming convention matches local srt files, so if you download the timed text by page title name into a folder applications like vlc would know what to do with it.

Would it be possible to make some small adjustments to the translate extension to support the timedText namespace?

--michael

(In reply to comment #1)

We use .srt because that is the format of the timed text. We could imagine
future timed text formats like popcorn, or vtt or any other timed text stuff
we
may want to use in the future.

I'm not shallenging the use of srt format. :)

Also the languageCode.srt naming convention
matches local srt files, so if you download the timed text by page title name
into a folder applications like vlc would know what to do with it.

One can't just "save with name" or "save as" and keep the extension, and ?action=raw doesn't keep the extension or page title at all, firefox for instance saves it as index.php.
So this is not a feature we would be removing, as it doesn't exist and would rely on something else anyway.

mdale wrote:

The basic point is its useful to distinguish time text types. We are using .srt today, but may use .vtt in the future if the timed text namespace did not have an extension, how would we distinguish types?

(In reply to comment #3)

The basic point is its useful to distinguish time text types. We are using
.srt today, but may use .vtt in the future if the timed text namespace did
not
have an extension, how would we distinguish types?

The namespace doesn't contain the extension. The title does, but does it matter if it's at the end of the fullpagename? Can as well be at the end of basepagename.
I guess btw ContentHandler doesn't require it to know the format if needed?

mdale wrote:

yes, I mean the page title within the timed text namespace.

ContentHandler may be a way to go, would need to look into it in more detail. But things like <languages/> would not inherently be compatible if timed text had a different type and was not "wikitext"

I suppose it could work that way .. i.e TimedText:FileName.webm.srt/en instead of TimedText:FileName.webm.srt.es ..

Who is the author of translate extension? If its not hard for the translation extension to special case the timed text pages that might be easier then moving all the pages and changing all the templates and code in Timed text. We have been using .{languageCode}.srt for a few years.

(In reply to comment #5)

I suppose it could work that way .. i.e TimedText:FileName.webm.srt/en
instead
of TimedText:FileName.webm.srt.es ..

Yes.
The Translate extension has existed for several years too, and language code subpages are the standard in MediaWiki (system messages) and also Commons (thousands of templates, hundreds of pages...).
The author is Niklas, in cc.

Again, it's not up to me to tell what's the best technical way, subpages seem less confusing for users but if it's a huge problem maybe another solution should be found, I don't know.

ContentHandler may be a way to go, would need to look into it in more detail.
But things like <languages/> would not inherently be compatible if timed text
had a different type and was not "wikitext"

Yes, <languages/> must be avoided, but this may be something to be left for the translation administrators to check.
Daniel can perhaps give some suggestion if ContentHandler is actually required or preferable?

So, what is the current status of this?

What’s the way forward − changing Translate to have it work .{languageCode}.srt ; or changing TMH (I guess) to use .srt/{languageCode}

If you ask me I’d be more inclined to the second solution, as Nemo said the /en way has been around forever and the rest of Commons works this way ; but I guess that’s not my call.

(Heck, I’m so desperate about this that I considered today enabling Translate on a TimedText and use crazy redirects in the hope it would work :-þ)

(In reply to comment #7)

(Heck, I’m so desperate about this that I considered today enabling Translate
on a TimedText and use crazy redirects in the hope it would work :-þ)

Redirects? How about transclusion, did you try it? You call it crazy but there's probably nothing else to do, I don't think TMH is receiving any substantial feature development as of now. Commons could set up some bots to handle the sync of the .{languageCode}.srt pages.

Is it just changing the name of the pages? I could try to look at that next week

mdale wrote:

Its a relatively simple change. but {languageCode}.srt is more standard way to represent file names of subtitles. i.e if you wanted to download the subtitle file we would have to remap things for the name to make sense on your file system.

How much work would it be to special case the TimedText namespace in the translate extension?

mdale wrote:

Sorry I realize my comment is sort of a loop of what I previously mentioned on thread. If consensus is .srt/{languageCode} lets just do that.

Bawolff in reviewing / implementation consider download links such as these:
https://commons.wikimedia.org/w/index.php?title=TimedText:Fra_Mauro%27s_Map_of_the_World.ogv.en.srt&action=raw&ctype=text%2Fx-srt

the /{languageCode} should be mapped to before the .srt so that its a valid local srt file if possible.

We could just send content-disposition headers if it really matters.

As it stands, that sort of url would lead to an index.php filename I believe

mdale wrote:

In the context of the player we set:

.attr( {

'href': source.getSrc(),
'download': fileName

})

Which browser use to trigger a download link with given file name. It would be a small change to parse the title check for timedText namespace, and re-arrange things so it has .srt extension.

But just something to keep in mind.

(In reply to comment #13)

In the context of the player we set:

.attr( {

'href': source.getSrc(),
'download': fileName

})

Which browser use to trigger a download link with given file name. It would
be
a small change to parse the title check for timedText namespace, and
re-arrange
things so it has .srt extension.

But just something to keep in mind.

Actually, we can't directly make the url have an extension other than .php due to security bugs in safari and some version of ie, but that's kind of a separate problem

(In reply to Nemo from comment #8)

(In reply to comment #7)

(Heck, I’m so desperate about this that I considered today enabling Translate
on a TimedText and use crazy redirects in the hope it would work :-þ)

Redirects? How about transclusion, did you try it?

I felt crazy enough today to try this:

This workflow is getting me crazy: seems like all the pieces are here, and yet we jump through hoops. This is deeply frustrating :-(

As I see this, part of the problem here is that the Translate extension is usually setup, is targeting wikitext, which is actually total overkill in this situation. Instead it probably would be better if Translate had basic understanding of the SRT format and just translate the 'values' as they are encoded in the SRT file. This will require several changes to the Translate extension.

Also, for /language subpages etc, we would want to configure custom disposition headers to make sure it downloads as name.languageCode.srt (which can be easily understood by most video players).

And we need to keep backwards compatibility with redirects or something, for existing translations.

Instead it probably would be better if Translate had basic understanding of the SRT format and just translate the 'values' as they are encoded in the SRT file. This will require several changes to the Translate extension.

Sounds like it could use some sort of plugin architecture? There are several other translatable-but-not-wikitext things that could use make of that (SVG files, gadget configuration files, image annotations...)

@Tgr definitely.

@Nikerabbit I have no idea at all about the design and structure of the translate extensions, can you say something about feasibility, challanges etc that would apply to making this more generic (or rather make a stricter binding to more specific content models i guess)

I don't see what's overkill about wikitext. SRT is just a text format and TMH could just read the subpages produced by Translate, which wouldn't contain any markup.

@Nikerabbit I have no idea at all about the design and structure of the translate extensions, can you say something about feasibility, challanges etc that would apply to making this more generic (or rather make a stricter binding to more specific content models i guess)

You're looking for a FFS https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Extension:Translate/File_format_support but reading from the wiki. Not yet done elsewhere.

If you don't want to use the page translation feature, you can implement support for srt in either on MessageGroup or FFS level (FFS expects to work with real files in a file system), or both. Nemo_bis already linked some documentation and there are plenty of fully functional examples of both in the Translate extension and some outside, like TranslateSvg. My question would be, is there something missing in Translate that prevents you from doing this?

No clue, I never saw one class of Translate. I was looking for entry points, and I think you both helped me find those. I'll see if I can find if there would be any other blockers to achieving this..

With the current bug summary, this is a rather trivial change, not a "possible tech project", AFAICS. If the scope is expanded to something bigger, please change the summary. :)