Page MenuHomePhabricator

CX2: Support mapping templates based on their parameter names
Closed, ResolvedPublic

Description

When translating Miosis from English to Swedish, the infobox is added into the translation with empty parameters:

Screen Shot 2018-11-13 at 11.54.30.png (499×1 px, 135 KB)

In this case, the infobox is based on the template "Infobox medical condition (new)" which exists in both English and Swedish and it is connected through Wikidata. In this case, the Swedish version lacks templateData information (which prevents CX from completing the mapping), but the parameters used in the template are exactly the same since the template was ported into Swedish from English. In this particular case, on CX1 copying the original template over would work while that is not possible in CX2.

For cases where templateData is missing, Content Translation can consider matching the existing parameters by name as a fallback approach.

In this process we need to make sure that we avoid using inexistent parameters on the target template (as described in T199308: CX2: Avoid using inexistent parameters when mapping template parameters)

Event Timeline

Pginer-WMF triaged this task as Medium priority.Nov 7 2018, 7:02 PM
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.
Pginer-WMF raised the priority of this task from Medium to High.Nov 12 2018, 11:29 AM

The en->ja pair now has template data in both languages. From history, I see User:Doc James added that to ja.wikipedia.org

https://en.wikipedia.org/w/api.php?action=templatedata&titles=Template:Infobox_medical_condition_(new)
https://ja.wikipedia.org/w/api.php?action=templatedata&titles=Template:Infobox_medical_condition_(new)

The template data is identical, while it miss one of the parameter defined in actual template - pronunciation

{{Infobox medical condition (new)
| name            = Miosis
| synonyms        = Myosis
| pronunciation   = /maɪˈoʊ sɪs/
| image           = Myosis due to opiate use.jpg
| caption         = Miosis due to [[opiate]] use
| pronounce       = 
| field           = 
...
}}

Because of that, pronunciation is not transferred to target template.

image.png (555×1 px, 119 KB)

For cases where templateData is missing, Content Translation can consider matching the existing parameters by name as a fallback approach.

We were using a regular expression to extract template params from template source code when templatedata is missing. We had to improve it to avoid T199308: CX2: Avoid using inexistent parameters when mapping template parameters The side effect is, when params are extracted they are not false positives. The negative effect is, sometimes we will miss to extract some params. That is the trade off for making sure we don't add inexistant params to translation.

This particular article(before Doc james fixed templatedata in ja.wiki) has the situation of source language having templatedata, target language missing template data and templates being same.

In this case, while extracting the template params from template source code, we can use the help of source template data. That is, use the improved regex and then use the old regex but filter out params that does not exist in source template params in source template data.

Needless to say, having templatedata in both wikis, or even better, in a central place is the path we need to go ultimately.

The en->ja pair now has template data in both languages. From history, I see User:Doc James added that to ja.wikipedia.org

I updated the example to use en -> sv, since the Swedish version of the template still lacks template data. This may facilitate the testing process (at least until someone adds the missing templateData for Swedish)

Change 473189 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] WIP: Use source templatedata as hint for extracting params from target

https://gerrit.wikimedia.org/r/473189

Change 473189 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Use source templatedata as hint for extracting params from target

https://gerrit.wikimedia.org/r/473189

Mentioned in SAL (#wikimedia-operations) [2018-12-05T03:06:49Z] <kartik@deploy1001> Started deploy [cxserver/deploy@a3dd2ca]: Update cxserver to c4240e6 and enable Youdao MT (T208985, T210578)

Mentioned in SAL (#wikimedia-operations) [2018-12-05T03:11:16Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@a3dd2ca]: Update cxserver to c4240e6 and enable Youdao MT (T208985, T210578) (duration: 04m 26s)

Etonkovidova subscribed.

Checked in cx2-testing. The same example article "Miosis" was checked for the following lang pairs en->svenska and en->ja

Screen Shot 2019-01-22 at 4.24.16 PM.png (501×951 px, 176 KB)

Screen Shot 2019-01-22 at 4.25.28 PM.png (503×963 px, 145 KB)

The caption field was not translated because Yandex MT option is not available in cx2-testing.