Brief summary
This mentorship is a component of an active research project about translation imbalances.
When we compare the number of translations made between pairs of languages, we find very high ratios of articles being translated from languages with a larger wiki presence into languages with a smaller presence. English alone is the source language for 70% of all published translations, and the pattern seems to repeat for other colonial tongues.
We would like to understand why this is. We've begun to find explanations in the software design choices, and there are many potential influences behind each translator's choice of article and languages. Some of these factors might be: the number of articles available in each language, cultural richness and blind spots, suggestions made by software, the availability and quality of machine translation, and more.
The Outreachy component of our project will follow one of these possible avenues for investigation.
Suggested skills
There are many entry points into this topic area, and candidates can choose where they want to engage. The areas we will work in include:
- User experience research
- Data engineering and analysis
- Node.js backend programming
- PHP backend programming
- Vue.js frontend programming
Mentors
Microtasks
Please feel free to work on tasks even if another candidate has started commenting, since there could be many ways of addressing each question and duplicated work is not wasted.
Please note that each participant is not required nor expected to complete every microtask. We've listed a variety of tasks so that people can go into depth on the subjects that most interest them, and the final project will be adapted according to these interests rather than spanning every discipline.
Initial tasks (mentors will continue to add tasks here throughout the contribution period).
- T331199: Read paper and make guesses about how it applies to translators
- T331200: Ultralight systematic literature review
- T331201: Extract cxserver configuration and export to CSV
- T331202: Configuration evolution over time
- T331204: Produce flow diagrams illustrating translation imbalances
- T331207: Compose a short survey for Content Translation users
- T332643: Rough integration of time machine and configuration scraper
- T332647: Compare config scraper output with config API