Change Details

Machine translation only exists for certain language pairs, and the Content Translation service only supports some of those. A [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/cxserver/+/refs/heads/master/config/ | set of YAML files ]] under "config/" in the cxserver repository determines which languages are supported by the service. Write a parser for these files and create a single flat, in-memory structure with all of the supported pairs. Export this data as a CSV of all pairs, with at least the following columns: | source language | target language | translation engine | is preferred engine? | | de | en | DeepL | true | The configuration files have several different file structures. Most have the source as the top-level key, and target languages as a list of values under that key. Watch out for the "handler" key which indicates a non-standard interpretation for the file. Some YAML files should be ignored, currently: `MWPageLoader.yaml`, `languages.yaml`, `JsonDict.yaml` and `mt-defaults.wikimedia.yaml`. [[ https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/config.prod.yaml#L93 | Here is the configuration ]] showing how the various YAML files are wired into the application—as you can see, it's safe to assume that the config file base name is the same as the translation engine name. You can filter the filenames either with an allowlist, a disallowlist, or by parsing the main configuration to find an exact list of valid files. One possible approach would be to adapt the existing cxserver source, reusing its built-in config import, and then transforming the data once it's already loaded into memory in a more consistent structure. Another approach is to pick your favorite programming language, find a YAML library, and write the parser from scratch. This latter approach is probably going to be the simpler option. Please consider the `mt-defaults.wikimedia.yaml` file and what its effect might be on the supported translation pairs and default translation engine for each pair.

Machine translation only exists for certain language pairs, and the Content Translation service only supports some of those. A [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/cxserver/+/refs/heads/master/config/ | set of YAML files ]] under "config/" in the cxserver repository determines which languages are supported by the service. Write a parser for these files and create a single flat, in-memory structure with all of the supported pairs. Export this data as a CSV of all pairs, with at least the following columns: | source language | target language | translation engine | is preferred engine? | | de | en | DeepL | true | The configuration files have several different file structures. Most have the source as the top-level key, and target languages as a list of values under that key. Watch out for the "handler" key which indicates a non-standard interpretation for the file. Some YAML files should be ignored, currently: `MWPageLoader.yaml`, `languages.yaml`, `JsonDict.yaml`, `Dictd.yaml` and `mt-defaults.wikimedia.yaml`. [[ https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/config.prod.yaml#L93 | Here is the configuration ]] showing how the various YAML files are wired into the application—as you can see, it's safe to assume that the config file base name is the same as the translation engine name. You can filter the filenames either with an allowlist, a disallowlist, or by parsing the main configuration to find an exact list of valid files. One possible approach would be to adapt the existing cxserver source, reusing its built-in config import, and then transforming the data once it's already loaded into memory in a more consistent structure. Another approach is to pick your favorite programming language, find a YAML library, and write the parser from scratch. This latter approach is probably going to be the simpler option. Please consider the `mt-defaults.wikimedia.yaml` file and what its effect might be on the supported translation pairs and default translation engine for each pair.