In T290906, we pulled a sample of content translation ids that met a specific set of article specifications for an upcoming design research study (See T288012) The sample consists of 50 published translations for each of the 3 target languages: Albanian (sq), Indonesian (id), and Standard Written Chinese (zh).
To support the linguistic analysis that will follow, we now want to provide these translations so that the CX publications and MT outputs are presented side-by-side, ideally in a spreadsheet.
See suggested format specifications below:
**Format Specifications:**
For each of the translations in the sample, the following data is needed in a side-by-side format:
* //CX-published article at the time of initial publication//. This is the `user` version from the API.
* //Initial unmodified machine-translation output for each CX publication.// This is the `mt` version from the API.
* //[Nice to Have] Historical snapshot of source article at time of MT output generation.// This is the `source` version from the API.
* //Associated meta data or a link to the meta data provided in the sample pull// [[ https://docs.google.com/spreadsheets/d/1mVqUWG_4CDcMgtQ643rcViMTZNky5kV2y-7y-C6hL98/edit#gid=1706774085 | doc ]].
* [Nice to Have] Having the articles broken down such that MT outputs and the CX-published article are presented paragraph-by-paragraph would be further advantageous.
* [Nice to Have] To the extent that it's possible, it would be ideal to include any and all content (such as images, templates, etc) that is translated into the target article as it gives us a better overall picture.
**Data Sources:**
* [List of translation ids and associated meta data for each article in the sample](https://docs.google.com/spreadsheets/d/1mVqUWG_4CDcMgtQ643rcViMTZNky5kV2y-7y-C6hL98/edit#gid=1706774085)
* [Parallel Corpora API](https://www.mediawiki.org/wiki/Content_translation/Published_translations?useskin=vector-2022): Provides parallel text of a translation based on translation id.
**Timeline**
Per @Easikingarmager: "I anticipate we could be ready to start within 3-4 weeks, but if you're still wrapping up then, we could always begin by pulling them manually at first. In the very early stage we'll be sorting out some details in how we're approaching things so it'll go slower at first."