CX2: Support mapping templates based on their parameter names
Open, HighPublic

Description

When translating Miosis from English to Swedish, the infobox is added into the translation with empty parameters:

In this case, the infobox is based on the template "Infobox medical condition (new)" which exists in both English and Swedish and it is connected through Wikidata. In this case, the Swedish version lacks templateData information (which prevents CX from completing the mapping), but the parameters used in the template are exactly the same since the template was ported into Swedish from English. In this particular case, on CX1 copying the original template over would work while that is not possible in CX2.

For cases where templateData is missing, Content Translation can consider matching the existing parameters by name as a fallback approach.

In this process we need to make sure that we avoid using inexistent parameters on the target template (as described in T199308: CX2: Avoid using inexistent parameters when mapping template parameters)

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2018, 7:00 PM
Pginer-WMF triaged this task as Normal priority.
Pginer-WMF updated the task description. (Show Details)Nov 12 2018, 10:08 AM
Pginer-WMF raised the priority of this task from Normal to High.Nov 12 2018, 11:29 AM

The en->ja pair now has template data in both languages. From history, I see User:Doc James added that to ja.wikipedia.org

https://en.wikipedia.org/w/api.php?action=templatedata&titles=Template:Infobox_medical_condition_(new)
https://ja.wikipedia.org/w/api.php?action=templatedata&titles=Template:Infobox_medical_condition_(new)

The template data is identical, while it miss one of the parameter defined in actual template - pronunciation

{{Infobox medical condition (new)
| name            = Miosis
| synonyms        = Myosis
| pronunciation   = /maɪˈoʊ sɪs/
| image           = Myosis due to opiate use.jpg
| caption         = Miosis due to [[opiate]] use
| pronounce       = 
| field           = 
...
}}

Because of that, pronunciation is not transferred to target template.

santhosh added a comment.EditedNov 13 2018, 6:16 AM

For cases where templateData is missing, Content Translation can consider matching the existing parameters by name as a fallback approach.

We were using a regular expression to extract template params from template source code when templatedata is missing. We had to improve it to avoid T199308: CX2: Avoid using inexistent parameters when mapping template parameters The side effect is, when params are extracted they are not false positives. The negative effect is, sometimes we will miss to extract some params. That is the trade off for making sure we don't add inexistant params to translation.

This particular article(before Doc james fixed templatedata in ja.wiki) has the situation of source language having templatedata, target language missing template data and templates being same.

In this case, while extracting the template params from template source code, we can use the help of source template data. That is, use the improved regex and then use the old regex but filter out params that does not exist in source template params in source template data.

Needless to say, having templatedata in both wikis, or even better, in a central place is the path we need to go ultimately.

Pginer-WMF updated the task description. (Show Details)Nov 13 2018, 10:56 AM

The en->ja pair now has template data in both languages. From history, I see User:Doc James added that to ja.wikipedia.org

I updated the example to use en -> sv, since the Swedish version of the template still lacks template data. This may facilitate the testing process (at least until someone adds the missing templateData for Swedish)

Change 473189 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] WIP: Use source templatedata as hint for extracting params from target

https://gerrit.wikimedia.org/r/473189

santhosh claimed this task.Nov 14 2018, 6:17 AM

Change 473189 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Use source templatedata as hint for extracting params from target

https://gerrit.wikimedia.org/r/473189

Mentioned in SAL (#wikimedia-operations) [2018-12-05T03:06:49Z] <kartik@deploy1001> Started deploy [cxserver/deploy@a3dd2ca]: Update cxserver to c4240e6 and enable Youdao MT (T208985, T210578)

Mentioned in SAL (#wikimedia-operations) [2018-12-05T03:11:16Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@a3dd2ca]: Update cxserver to c4240e6 and enable Youdao MT (T208985, T210578) (duration: 04m 26s)