Page MenuHomePhabricator

Invent automatic segmentation in translation units without <translate> tags
Open, Needs TriagePublic

Description

Per @Nemo_bis on the parent.

Event Timeline

Hi Friends,
First of all, I'd like to clarify this task: Please confirm that the idea is to get rid of the <!-- T1 --> kind of tags?

We (at Dokit.io) would like to help with this task. Did you already though of a way to do it?

In my understanding, Extension:ContentTranslation split a page into translation units without using Tags.

Could it be possible to use what Extension:ContentTranslation uses? (documented here)

Thanks
ping: @Nikerabbit

ContentTranslation does not accurately track changes to the source article, which is the exact function of these tags.

The leading idea currently is T143327: Make the Translate extension be based on DOM, not strings to reduce the need of tagging, and to not show the remaining tags in the visual editor. Possibly Multi-Content-Revisions could be used for storing the metadata. Algorithms could be developed to detect moved sections (the diff component can already do this) and to fuzzy match the remaining ones.

Thank you for the project status.
I see. It's a pretty important one.
Since we probably can't tackle this task on our own, we won't invest effort now.
However, if you decide to work on this in a near future and feel like you need assistance with this, feel free to ask us.
We have a senior MW developper willing to help.

Moreover, showing tags in VE is still a problem for us.
Even though it's not an ideal solution, the alternative we tough of is: when using VE, we could parse the content and don't display the Tags. Then when a unit is removed/merge (the parser sees two consecutive tags), then it can automatically remove the first tag (knowing that will generate outdated translation units).