Page MenuHomePhabricator

Port the markup transfer feature of cxserver to MinT
Closed, ResolvedPublic

Description

After the release of MinT we are exploring to scale the service beyond the initial use in Content Translation.

Content Translation is able to keep formatting properties such as bold text and links as text is translated, even for translation services that only support plain-text. This is possible with a specific algorithm to re-apply the formatting style to the output of a plain-text translation. This capability can be useful in a more general context with MinT (e.g., to translate documents, websites, or other rich text contents).

This ticket is focused on porting the markup transfer feature of cxserver to MinT so that rich text markup is possible in MinT instead of going through cxserver. So that we can translate arbitrary html content using MinT.

The particular case of links will be covered in follow-up tickets since it may need to support different ways of processing depending on the context. For example, the link target could (a) remain as it is, (b) adapt the target to the equivalent content in another language (e.g., a wikipedia article in another language), or (c) link to an MT version of the destination. Support for this can be part of future iterations on separate tickets.

Event Timeline

Pginer-WMF triaged this task as Medium priority.Jul 10 2023, 1:42 PM
Pginer-WMF added a subscriber: santhosh.

Change 951205 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] mint: Use the HTML translation capability

https://gerrit.wikimedia.org/r/951205

Change 956209 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Increase the character limit per translator class

https://gerrit.wikimedia.org/r/956209

Change 956209 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Increase the character limit per translator class

https://gerrit.wikimedia.org/r/956209

Change 961977 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-09-28-043052-production

https://gerrit.wikimedia.org/r/961977

Change 961977 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-09-28-043052-production

https://gerrit.wikimedia.org/r/961977

Change 951205 merged by jenkins-bot:

[mediawiki/services/cxserver@master] mint: Use the HTML translation capability

https://gerrit.wikimedia.org/r/951205

Change 964846 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-10-11-045323-production

https://gerrit.wikimedia.org/r/964846

Change 964846 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-10-11-045323-production

https://gerrit.wikimedia.org/r/964846

Change 965022 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-10-11-045323-production

https://gerrit.wikimedia.org/r/965022

When trying MinT on Content and Section translation, links are gone and references are misplaced with MinT. I created a separate ticket with an example: T348612: References moved to the end of the sentence and links disappear when translated with MinT

Change 965109 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-10-11-114410-production

https://gerrit.wikimedia.org/r/965109

Change 965109 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-10-11-114410-production

https://gerrit.wikimedia.org/r/965109

Mentioned in SAL (#wikimedia-operations) [2023-10-11T12:00:40Z] <kart_> Updated cxserver to 2023-10-11-114410-production (T341478, T347939)