For the section translation feature of content translation project, an API to identify the missing section between articles in two languages need to be developed. This api then used in the user interface to show the missing sections to user
As part of the Section Translation process, users pick a section to translate (T241587). To facilitate such selection the tool will show which article sections are missing and which are present in the target language.
In order to support this, we need a service that given an article (Q-ID or page name) and language pair (source, and target) provides the mapping of likely equivalent sections.
The purpose of this ticket is to provide a basic approach that can complement the more advanced approach from the Research team (T224234). This would work as a fall-back and as an initial version until the later is ready.
Some strategies that can be considered to find the mappings:
- Translate section titles. Make a fuzzy match of the automatic translation of the source section titles with those in the target language. This is similar to what is done to re-apply formatting and links after those are lost by plain text translation services in Content Translation.
- Use Content Translation info. For articles translated with Content Translation, use the mapping information from the tool to identify equivalent sections.
- Inspect section contents and map the linked topics. Extract the articles linked in each section, extract their Q-IDs and check for number of coincidences across source and target sections.
Since the more advanced approach may not cover all languages or topics, the basic approach is expected to coexist with it. So it is worth considering how the new approach will be integrated into the system when designing the current one.