The current JSON configuration is not compact and will grow very fast as we add more languages. We need a better config file that represent the permutations the languages can create.
Description
Details
Related Objects
Event Timeline
A proposal for new configuration
registry: { source: [ 'af', 'an', 'ar', 'bg', 'bs', 'ca', 'cr', 'cy', 'en', 'eo', 'es', 'fr', 'gl', 'hi', 'hr', 'id', 'kk', 'mk', 'ms', 'mt', 'nl', 'oc', 'pt', 'ru', 'tt', 'ur' ], target: [ 'af', 'an', 'ar', 'bg', 'bs', 'ca', 'cr', 'cy', 'eo', 'es', 'fr', 'gl', 'hi', 'hr', 'id', 'kk', 'mk', 'ms', 'mt', 'nl', 'oc', 'pt', 'ru', 'tt', 'ur' ], mt: { Apertium: { af: 'nl', ar: 'mt', an: 'es', bg: 'mk', br: 'fr', ca: [ 'es', 'en', 'eo', 'fr', 'oc', 'pt' ], cy: 'en', en: [ 'bs', 'ca', 'cr', 'eo', 'es', 'gl', 'hr', 'sr' ], eo: 'en', es: [ 'ca', 'pt', 'it', 'oc', 'en', 'fr', 'an', 'eo', 'gl' ], eu: 'en', fr: [ 'ca', 'eo', 'es' ], gl: [ 'en', 'es', 'pt' ], hi: 'ur', id: 'ms', is: 'en', kk: 'tt', mk: [ 'bg', 'bs', 'hr', 'mk', 'sr' ], ms: 'id', mt: 'ar', nb: [ 'da', 'nn' ], nl: 'af', nn: [ 'da', 'nb' ], oc: [ 'es', 'ca' ], pt: [ 'ca', 'es', 'gl' ], ro: 'es', sh: 'sl', sl: [ 'bs', 'cr', 'hr', 'sr' ], sv: [ 'da', 'is' ], tt: 'kk', ur: 'hi' }, Yandex: { en: 'ru' } }, dictionary: { JsonDict: { en: 'es', es: 'ca' } } }
- If we don't want to publish to a certain wiki, but want it as source, we can do. See en missing from target
- A big change is there is no hand picked pair. All combination of source and target is available
- But mt, dictionary are configured per pair
- It is not possible to blacklist a specific pair. I don't see any reason for such requirement. But tools(mt, dictionary..) can be specifically configured
There is also a design implication - Currently in source selector we list the languages without MT as "disabled-ish". I am not sure it communicate the intention - that the languages not disabled has MT. Why MT is used as single criteria for enabling it?
Our observation so far suggest that people do use CX without MT too. So IMO, we should treat all languages as same. If we grey out the language, chances are less that people will select it. And provide an explicit indication that if you select this language, you will get mt and dictionary, if you select this, there is no mt, but dictionary available etc.
In addition to this, we need to take care of requirements like
- en-ru has two MT engines. How to configure which is default? How to configure no-mt as default?
My rough plan is :
mt: { defaults: { 'en-ru': 'Yandex' or 'no-MT' }, Apertium: {} ... }
similar way for dictionary too.
Persistent user preferences for these things are out of scope here
Apertium provides support for very specific language pairs so we need to be able to capture those.
For MTs where they support all combinations of a set of languages, we could describe that support in a more compact way. For example:
Yandex: { *: ['en', 'ru', 'es', 'ca'] }
instead of:
Yandex: { en: ['en', 'ru', 'es', 'ca'], ru: ['en', 'ru', 'es', 'ca'], es: ['en', 'ru', 'es', 'ca'], ca: ['en', 'ru', 'es', 'ca'] }
For MT we may have a default preference order (e.g., how engines are defined in the file) so that we only need to define preferences when overriding that order.
We see that we cannot enable all language pairs even if Yandex support all combinations - Many translations are done by a trip via English. So in practice, just like Apertium, we will have to pick pairs
For MT we may have a default preference order (e.g., how engines are defined in the file) so that we only need to define preferences when overriding that order.
yes, so my idea is to use 'default' key only when we need override
We see that we cannot enable all language pairs even if Yandex support all combinations - Many translations are done by a trip via English. So in practice, just like Apertium, we will have to pick pairs
Yes, we need a way to specify specific pairs, but a way to define "all combinations of the following languages: A, B, C, D" (plus some additional specific pairs) seems convenient to make the configuration simpler. Consider also that we have also the possibility to make some of those pairs to be "no-MT" as their default value (which is almost disabling them but leaving the user the last choice).
There is also a design implication - Currently in source selector we list the languages without MT as "disabled-ish". I am not sure it communicate the intention - that the languages not disabled has MT. Why MT is used as single criteria for enabling it?
What you describe is not the current behaviour or the expected one (see screenshot below). The greyed out languages are supposed to represent the target languages not available given the current language selected as source. We probably should show them (otherwise user may miss that the tool as a whole supports them) but make them non-selectable.
In any case I think we should move towards supporting all combinations of a given set of languages (even if some lack MT or it is disabled as default). That would avoid the need to disable specific pairs.
Change 189942 had a related patch set uploaded (by Santhosh):
Registry: Use a compact structure
Change 190158 had a related patch set uploaded (by Santhosh):
Support new language configuration format