The goal is to create a data set with section names aligned across languages, meaning that given a section name in source language, we want to create a mapping for the equivalent section name in other languages.
Example: Given the section name: 'Awards' (English), we want to map this title to 'Premios' (Spanish) and 'Prêmios' (Portuguese).
Our preliminary studies shows that automatic translation (a.k.a machine translation) does not give enough accuracy. There are several reasons for that, for example, conventions changes across languages, (e.g. English Wikipedia usually have one section References, and another Section notes, while French Wikipedia usually use just one named "Notes et références"), and also that section names are not translated literally .
Within this task we will study, compare, and combine several approaches such as the aforementioned Machine Translation, cross-lingual word embedding, and heuristics based on Wikidata information.