Page MenuHomePhabricator

Use Extension:JsonConfig for storing the template param mapping between languages
Open, MediumPublic

Description

As per Template translation support plans for Content translation(See https://www.mediawiki.org/wiki/Content_translation/Templates ) we need a way to store the template parameter mappings translators create. This will help future translations reuse the mapping and optionally accept improvements.

Investigate if Extension:JsonConfig is the right choice for such a centralized storage.

Here is an example mapping we have in CX codebase https://github.com/wikimedia/mediawiki-extensions-ContentTranslation/blob/master/modules/source/conf/en-cy.json

Event Timeline

Yurik added a comment.Aug 25 2016, 8:38 PM

Totally doable, but the question is how you want to structure your end result -- I would think you have a template param name, and a dictionary of language->translation pairs. I suspect that using tabular data (which is based on JsonConfig) might be better for that. Also, what kind of editing tools are you invisioning?

In a translation, for adapting a template from a source language to target language, the look up using an API will be with these parameters: source language, target language, source template name, target template name

So, I am thinking of arranging data under the following pattern of URLs:

https://meta.wikimedia.org/wiki/Config:CXTemplateMapping/SourceLanguage/SourceTemplateName/TargetLanguage/TargetTemplateName with the mapping as

{
  "parameters": {
     "SourceTemplateParamName1": "TargetTemplateParamName1",
     "SourceTemplateParamName2": "TargetTemplateParamName2"
   }
}

A real example:
https://meta.wikimedia.org/wiki/Config:CXTemplateMapping/en/Cite_web/cy/Dyf_gwe
(actually, https://meta.wikimedia.org/wiki/Config:CXTemplateMapping/en/cy/Cite_web/Dyf_gwe will also work

{
    "parameters": {
	"url": "url",
	"title": "teitl",
	"author": "awdur",
	"date": "dyddiad",
	"month": "mis",
	"year": "blwyddyn",
	"work": "gwaith",
	"location": "lleoliad",
	"page": "tudalen",
	"pages": "tudalennau",
	"format": "fformat",
	"archive-url": "urlarchif",
	"archiveurl": "urlarchif",
	"archive-date": "dyddiadarchif",
	"archivedate": "dyddiadarchif",
	"publisher": "cyhoeddwr",
	"quote": "dyfyniad",
	"access-date": "dyddiadcyrchiad",
	"accessdate":  "dyddiadcyrchiad"
    }
}

Also, what kind of editing tools are you invisioning?

https://phabricator.wikimedia.org/T143121 has the designs @Pginer-WMF prepared.

Yurik added a comment.Aug 27 2016, 1:36 AM

@santhosh so you want to have an entire page per template per language, rather than one per template with all the languages. I guess it is slightly faster (you don't need to load all languages when you just need one), but slightly harder to maintain (you need to update all target languages when a template parameter changes). Since most templates have a very short list of parameters, I think you might as well keep all translations on one page.

Also, I think your title schema will make it fairly hard to work with programmatically - given a template en:Foo, you have no easy way to get the corresponding fr:Bar, unless you enumerate all of the pages that begin with Config:CXTemplateMapping/en/Foo/fr/, which also leads to a possibility of multiple pages matching your request. One option is to not have the trailing Bar, but instead have a field inside the config to give the target template's name.

Another issue is that either the source or the target template name might contain slashes - / - that's a valid title symbol. So I would suggest using Config:CXTemplateMapping/en/fr/Source_template_name -- this way everything after the target language slash could be part of the template name.

Thanks for the suggestions.

@santhosh so you want to have an entire page per template per language, rather than one per template with all the languages. ....

I considered it since we always work with a single language pair and template. So fetching and update would be more easy. I am not sure how frequent is template param change since that will affect all wiki pages which use that template. Another issue is updating a template param in one language does not mean that we can automatically update the mapping to all other languages. It will need manual edit from the template editing interface we are considering.

But I think it is fine to keep all target language mapping in single page. @Nikerabbit do you have any preference on this?

given a template en:Foo, you have no easy way to get the corresponding fr:Bar

We know the corresponding template title in target language before we ask for template mapping. We get this using wikidata connections and required in our workflow even before we try to map the templates params.

One option is to not have the trailing Bar, but instead have a field inside the config to give the target template's name.

Now that you suggest this, I think this is better approach - We dont need to have the target template name in URL at all. Just target language is enough. If we decide to keep all target languages in single page, the URL structure simplifies to Config:CXTemplateMapping/en/fr/Source_template_name

Another issue is that either the source or the target template name might contain slashes - / - that's a valid title symbol. So I would suggest using Config:CXTemplateMapping/en/fr/Source_template_name -- this way everything after the target language slash could be part of the template name.

Yes, this is a good suggestion. I have seen template names with slashes.

Arrbee removed a project: Epic.Aug 31 2016, 6:38 AM

But I think it is fine to keep all target language mapping in single page. @Nikerabbit do you have any preference on this?

It is easy to get list of subpages if we need to iterate over all target languages. Hence I would keep one mapping (instead of one template) per page for simplicity.

Per my understanding, the mappings are always one-directional, right? We don't have any plans for making those bi-directional, do we?

Another issue is that either the source or the target template name might contain slashes - / - that's a valid title symbol. So I would suggest using Config:CXTemplateMapping/en/fr/Source_template_name -- this way everything after the target language slash could be part of the template name.

Yes, this is a good suggestion. I have seen template names with slashes.

Yes very good. By not having target template name in the page name, we avoid slash ambiguities. For language codes, we must specify whether those are the MediaWiki internal codes or BCP47 codes. I would like BCP47, but for consistency with language subpages, internal codes are superior.

I would also consider dropping "CX" from the name.

Yesterday @Jdforrester-WMF suggested that this might be in conflict with what @Legoktm has plans to implement. More info in needed from them.

I said that going beyond just a JSON message or two in the MediaWiki: namespace and creating a Config: namespace with its own rules and expectations with tiers of what controls which users can set and for what scope (page, wiki, cluster) would conflict with T388: Graphical configuration interface which will do the same.

@Jdforrester-WMF The mapping we would like to store is somewhat different from the one proposed in the linked RFC. T143121: Support block template adaptation has the UX designs by @Pginer-WMF that can clarfiy what exactly we want to store, And it is not configuration for a wiki infrastructure, but template param mapping between two language pairs to prefill the template translation editor and save time for translators.

We plan to visit this after an iteration of the template editor deveopment and see if we really need to remember the mapping or the proposed UX will be suffice for the time being.

Jdforrester-WMF lowered the priority of this task from Medium to Low.Sep 9 2016, 5:40 PM

OK, will re-prioritise this then.

Arrbee raised the priority of this task from Low to Medium.Sep 13 2016, 11:54 PM
Arrbee added a subscriber: Arrbee.

Changing back to earlier priority as this is part of the ongoing template adaption improvement work.

Tabular data has launched, and could be easily used for this, assuming the community will be ok with this type of data. For example, each table column has an id (C++ style identifier), and the corresponding localized string {"en":"...", "fr":"...", ...}. Lua script can easily consume this information from any wiki, because mw.ext.data.get('datapage.tab') is available everywhere.