Page MenuHomePhabricator

CX2: Avoid using inexistent parameters when mapping template parameters
Closed, ResolvedPublic

Description

When adding a template to the translation, Content Translation looks for the equivalent template using Wikidata. In case the process fails, T192271 describes how to communicate what went wrong to the user. When the equivalent template is found, the information is transferred to the equivalent target parameters, using TemplateData for the mapping. In this process only parameters that exist in the target template should be added to the template.

Currently, when adapting an infobox template, a parameter from the original template ("image") that do not exist in the target template (note that the equivalent parameter is "imatge" instead) was added. An error message is shown as part of the target template (tested by translating "Uma Thurman" from English to Catalan in production):

Screen Shot 2018-07-11 at 11.47.40 2.png (511×1 px, 329 KB)

The resulting wikitext ("infotaula-persona" is the template generated):

Screen Shot 2018-07-11 at 11.48.52.png (265×649 px, 50 KB)

The expected result would be to either (a) use the parameters as defined in the target template, or (b) have the parameters missing. In any case, we should not add parameters to the target template that do not exist in such template.


For the particular case used in the example, the preferred adaptation would be to add the template without parameters since the target template is based on Wikidata, which means that information for missing parameters are queried from Wikidata (which is preferred since it reduces redundancy). The support for Wikidata-based templates is part of a separate ticket.

Event Timeline

Blocked on T200283.

In the example there are two parameters that got added to the target template that do not exist there ("image" and "parents"). The current ticket is about copying over those parameters despite the fact that they do not exist, regardless of the fact that one of them ("image") is an image.

T200283 seems about the way to adapt inline images, which we want to solve but I'm not sure it is a blocker for the current ticket. In order to solve the current ticket without being blocked by T200283 we may just need to pick another example not involving inline images, unless the presence of the inline image is what is causing the issue. Is my assessment correct?

Blocked on T200283.

In the example there are two parameters that got added to the target template that do not exist there ("image" and "parents"). The current ticket is about copying over those parameters despite the fact that they do not exist, regardless of the fact that one of them ("image") is an image.

T200283 seems about the way to adapt inline images, which we want to solve but I'm not sure it is a blocker for the current ticket. In order to solve the current ticket without being blocked by T200283 we may just need to pick another example not involving inline images, unless the presence of the inline image is what is causing the issue. Is my assessment correct?

I wanted to try the exact page which was reported to have an issue. Since that wasn't possible, I filed T200283.

Generally, your assessment is probably correct, there should be other pages without inline images, where this problem occurs. If you have more examples, that would be helpful.

Also, I tried checking out the commit before one that causes issue in T200283 and everything got adapted perfectly and there were no errors (with "Uma Thurman" page).

I debugged this case. The template https://ca.wikipedia.org/wiki/Plantilla:Infotaula_persona does not have template data(see https://ca.wikipedia.org/w/api.php?action=templatedata&titles=Plantilla:Infotaula person). So cxserver tries to find out the template parameters using the source code of template, which has image and parents as parameters. You can also see these parameters listed in https://ca.wikipedia.org/w/index.php?title=Plantilla:Infotaula_persona&action=edit. So image and parents keys are mapped to target template.

But, if there is a way we can identify that the https://ca.wikipedia.org/wiki/Plantilla:Infotaula_persona is a wikidata powered template, we can totally skip the parameter mapping and avoid the issues. But that is T199310: CX2: Better support for Wikidata-based templates

I debugged this case. The template https://ca.wikipedia.org/wiki/Plantilla:Infotaula_persona does not have template data(see https://ca.wikipedia.org/w/api.php?action=templatedata&titles=Plantilla:Infotaula person). So cxserver tries to find out the template parameters using the source code of template, which has image and parents as parameters. You can also see these parameters listed in https://ca.wikipedia.org/w/index.php?title=Plantilla:Infotaula_persona&action=edit. So image and parents keys are mapped to target template.

Just to clarify this. I looked at the source and expected the parameters to be on the left side (where "imatge" is but "image" is not). However "image" appears on the right side. Should we restrict the search to the parameters on the left side? Would that be ideal? would that be feasible?

Screen Shot 2018-07-25 at 08.50.00.png (400×1 px, 136 KB)

Change 447767 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Templates: Improve the template parameter extracter

https://gerrit.wikimedia.org/r/447767

Change 447767 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Templates: Improve the template parameter extractor

https://gerrit.wikimedia.org/r/447767

I feel like I am missing some context here. Won't this break for example https://en.wikipedia.org/w/index.php?title=Template:IPA&action=edit which has {{{1}}} without = immediately before it?

@Nikerabbit it seems that IPA-based template links are always displayed as red links when translation is started. I checked the enwiki (wmf.15) and cx2 - the behavior is the same.

I feel like I am missing some context here. Won't this break for example https://en.wikipedia.org/w/index.php?title=Template:IPA&action=edit which has {{{1}}} without = immediately before it?

The pattern to extract params using 3 { characters were returning lot of false positives. See the example mentioned in https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/447767 For example, on the right side of named parameter assignement, if there is a {{{image}}} image is identified as a parameter for the template, but actually it only appears in the value of a named param.

So, to extract named parameters, the regular expression is changed to look up at left side of those assignments.

But the case of unnamed parameters such as {{{1}}} will miss from it(only if the template does not have templatedata). You are correct about that case. But I wonder if we do a quick {{{\d+}}} match, it will again cause false positives.

When we were discussing this issue with @Pginer-WMF a general agreement was to avoid identifying parameters that are not present in the template. If we are not able to find mappings, there is a way to tell that to users, but if we falsely map the params. we might be communicating that we mapped all params successfully.
That is why I chose to tighten the fallback logic when templatedata is missing.

Do you have any suggestions to improve this?

Mentioned in SAL (#wikimedia-operations) [2018-08-08T06:13:44Z] <kartik@deploy1001> Started deploy [cxserver/deploy@6a0cab1]: Update cxserver to 951fdba (T199308, T199512, T199320, T200665, T200453, T106437)

Mentioned in SAL (#wikimedia-operations) [2018-08-08T06:17:16Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@6a0cab1]: Update cxserver to 951fdba (T199308, T199512, T199320, T200665, T200453, T106437) (duration: 03m 32s)

Do you have any suggestions to improve this?

I was mostly concerned whether this trade-off had been considered, since I saw no mention of it anywhere. The way to improve is to recommend the use of TemplateData. We could also fall back to the previous algorithm if the new one finds nothing.

Do you have any suggestions to improve this?

I was mostly concerned whether this trade-off had been considered, since I saw no mention of it anywhere. The way to improve is to recommend the use of TemplateData. We could also fall back to the previous algorithm if the new one finds nothing.

Related to this, there is a ticket about encouraging to fill TemplateData (T200314), but it also has some challenges since the process is not always straightforward.

It would be good to illustrate the problematic case of the IPA example in a separate ticket. I tried translating Arabic diacritics from English to Spanish and when a paragraph with the IPA template was added to the translation, the template was adapted/preserved (using Yandex, Apertium seem to have issues due to the lack of rich text support):

Screen Shot 2018-09-04 at 13.39.12.png (302×1 px, 89 KB)

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptSep 4 2018, 11:42 AM