Page MenuHomePhabricator

Improve MinT punctuation support for Japanese
Closed, ResolvedPublic

Description

As reported here Japanese language doesn't normally apply punctuation marks as , or ., but either or . However, MinT seems to be generating Japanese sentences with latin-based punctuation (note the presence of , and . below):

このページは,翻訳と反省を提供できる [[$2のウィキメディア財団統治ウィキ]]の"$1"に移動されました.

This ticket proposes to analyze the current situation and explore options to improve the punctuation support, including post-processing of the results as it was done for Punjabi (T215083).

Event Timeline

Change #1014151 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Normalize japanese punctuations

https://gerrit.wikimedia.org/r/1014151

Change #1014151 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Normalize japanese punctuations

https://gerrit.wikimedia.org/r/1014151

Change #1014729 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-03-26-120044-production

https://gerrit.wikimedia.org/r/1014729

Change #1014729 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-03-26-120044-production

https://gerrit.wikimedia.org/r/1014729

Punctuation marks seem to be using the correct characters when translating into Japanese now. I tried two text samples from English and Catalan in the test instance and only the expected symbols or appear, with no trace of the , or . symbols from the original texts that were incorrectly appearing in the translation before the fix:

English to Japanese using NLLB-200 modelCatalan to Japanese using Softcatalà model
translate.wmcloud.org_(Wiki Tablet) (9).png (768×1 px, 213 KB)
translate.wmcloud.org_(Wiki Tablet) (10).png (768×1 px, 292 KB)