Page MenuHomePhabricator

MinT: Detect language of source content automatically
Closed, ResolvedPublic


As an improvement for usability for translation service front-end, detect the language of source content automatically.
This is based on Compact Language Detector 2 library which is able to detect 83 languages.

Event Timeline

Change 905782 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Automatic language detection for source content

Change 905782 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Automatic language detection for source content

Hi @santhosh, can you please associate one or more active project tags with this task (via the Add Action...Change Project Tags dropdown)? That will allows to see a task when looking at project workboards or searching for tasks in certain projects, and get notified about a task when watching a related project tag. Thanks!

Pginer-WMF triaged this task as Medium priority.Apr 12 2023, 7:57 AM
Pginer-WMF added a project: ContentTranslation.
Pginer-WMF moved this task from Needs Triage to MT on the ContentTranslation board.

I have WIP patch to use fasttext to increase the coverage of languages, but on low priority now .

For long term solution discussion, follow

Change 929148 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Do not change current target selection when detecting language

Change 929148 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Do not change current target selection when detecting language

Change 929439 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-06-12-125157-production

Change 929439 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-06-13-061519-production

Pginer-WMF subscribed.

Initial support for 83 languages seems good-enough for the immediate use cases. Further developments in the space can be captured in follow-up tickets as sub-tasks of T99666: Provide a service to detect which language the user is writing on