Page MenuHomePhabricator

Proposal: Research into translation imbalances - Outreachy - 26(Leila Kaltouma)
Closed, DeclinedPublic



Name : Leila Kaltouma
Time zone : UTC
Email :
Location : Bamako, Mali
Working hours: 7:00 am to 9:00 pm UTC


The main objective of this project is to reduce translation imbalances and encourage translators to contribute to underrepresented languages by focusing on improving the content translation tool suggestion page. While statistics exist on an external page, I believe relevant information and statistics should be readily available on the selection page if we want to encourage people to contribute to underrepresented languages. Wikipedia already includes stats on their signup page see here. You can find here a screenshot of my suggestion page.
We could use the white space on the right to dynamically link relevant statistics from here such as weekly translation to/from or if the language is underrepresented and we need more contribution for that language. The project also aims to investigate alternatives to the current algorithms. Currently, only one language pair is suggested at a time, and this could limit the number of potential translations. As this project is managed by the Wikimedia language engineering team, we will create a mockup and present our findings for a small eventual experimental intervention if approved.

The project aims to:

  • Develop and implement a user-friendly interface that displays relevant statistics and information on language imbalances directly next to the suggestion section of the Content Translation tool.
  • Analyze the current suggested translation language algorithm and offer alternatives with additional suggested language pairs.
  • Conduct interviews with experienced translators to gain insights into their translation workflow and identify potential barriers to contributing to underrepresented languages.
  • Design and run a small experimental intervention to evaluate the impact of providing translators with additional information on the suggestion page for underrepresented languages.

By achieving these objectives, we hope to make the Content Translation tool more effective and accessible for all users, encourage contributions to underrepresented languages, and ultimately reduce translation imbalances across Wikimedia platforms.

Mentors: @awight and @Simulo


Week 1-2: Familiarization with Content Translation Tool

Get familiarized with the Wikimedia Content Translation tool, its features, and algorithm. Revisit the task "T331199 - Read paper and make guesses about how it applies to translators."

Week 3-4: Data and Algorithm Analysis
Analyze the existing statistics on the languages being translated to and from. Analyze the current algorithm that suggests articles and language pairs.

Week 5-6: Translator Interviews and Report Findings
Conduct interviews with translators to gain insights into their language and article selection processes, as well as their workflow. Analyze the data and report findings.

Week 7-8: Mockup Creation
Design and create a mockup of the proposed changes to the suggestion section, including improved language pair options, relevant information, and statistics on language imbalances.

Week 9-10: Mockup Feedback and Feasibility Discussion
Gather feedback for the mockup from mentors and translators and make appropriate changes. Discuss the feasibility of the changes with the language engineering team to implement the proposed changes.

Week 11-12: Experimental Intervention
Conduct a small experimental intervention on the Content Translation Tool suggestion page to investigate the impact of providing translators with more relevant information and statistics on their language of choice about any imbalances and the impact of their contributions for that language pair.

Week 13: Presentation and Feedback
Summarize the project and document the findings. Continue working on the project as needed.

About me

I am a self-taught developer. My passion for technology and eagerness for new challenges has led me on a continuous journey of learning and growth. Despite not having a formal college education, I have relied on my determination and the vast resources of the internet to acquire knowledge. It all started a year ago when through a free bootcamp program, I wrote my first line of HTML code, and since then, I have been driven to learn as much as I can about programming. French being my mother tongue and living in Mali where French is the predominant language, I recognize the importance of English as a global language and the opportunities it could unlock for me, I took it upon myself to learn the language online and expand my horizons.
In addition to that, I am founding a community (Women in Tech Without Border) of tech passionates, specifically for women who, like me, were unable to attend college. Our community aims to foster collaboration and idea-sharing. We’ve held some meetings and so far, language has been a challenge for some members, as Mali is a French-speaking country, while most available resources and documentation about programming are in English. To address this, I am currently providing English language instruction and guiding members on how to leverage online resources for independent learning as education here is not free and lacking.

Past experience with other communities

This is my first experience with open source and I feel honored to have the opportunity to make my first open source contribution with Wikimedia.
I have been a member of an amazing free online community called 100Devs, where I gained a solid foundation in programming. This community was created to support individuals like myself, and I have learned a great deal through my participation.

How did I learn about Outreachy?

In December of last year, I reached out to a friend and inquired about open-source opportunities. It was then that my friend introduced me to Outreachy

Past Experience

I have experience in working with javascript, css,html, react, node js. Among databases, I mostly worked with Mongo DB.

Microtasks carried out

Task I went through the article and wrote a summary and made some hypotheses on what pattern should be expected hypotheses and "informed guesses in a dataset of translations between different Wikipedias.My contribution

Task I conducted a light systematic review by searching on Google Scholar for articles related to "Wikipedia translation". I skimmed through the list of search results by title and identified articles that looked relevant, and then read the abstract for each one to get a better sense of the research topic and methodology. Next, I read through a total of 5 full articles for this exercise. Through this process, I aimed to identify any patterns or gaps in the existing research and to gain a better understanding of how the imbalances we're seeing in Wikipedia translations are being addressed in the literature. My contribution

Task This is a tool that enables the parsing of Wikimedia YAML config files. The config files are taken from Wikimedia-services-cxserver. The resulting data is exported as a CSV file that includes information such as the source language, target language, translation engine, and whether it is the preferred engine.My contribution

Task For this task, I created a repository that contained a JSON file with a simple data structure. I wrote a JavaScript code that parses the data and reads it into a native structure in memory. Then, I built a time machine that used GitHub API to play back the data repository's git history. At each commit, my code parsed the data and stored the entire sequence in memory along with the timestamp of the git commits. Finally, I converted the sequence into a CSV file.My contribution

Task To gather comprehensive insights on the usage of the content translation software and identify any potential challenges that users may have encountered, I formulated both quantitative and qualitative questions for this task. My contribution

Task I created a repository containing a minimal integration of two tasks, evolution-over-time-parser and cxserver, related to configuration evolution over time in the cxserver/config repository. The data is being fetched from all commits since 2017 in the cxserver/config repository. Each relevant config/*.yaml commit is parsed and then outputted as JSON.My contribution

Event Timeline

LeilaKaltouma renamed this task from ==Proposal: Research into translation imbalances - Outreachy - 26(Leila Kaltouma) to Proposal: Research into translation imbalances - Outreachy - 26(Leila Kaltouma).Apr 3 2023, 3:53 PM

Outreachy results are out! Declining this task as the proposal was not selected. You could consider finishing up any pending pull requests or tasks remaining from the contribution period. Your ideas and contributions to Wikimedia projects are still welcome. If you would still be eligible for the next round of Outreachy, we look forward to your participation!