Page MenuHomePhabricator

Proposal for research into translation imbalance
Closed, DuplicatePublic

Description

OUTREACHY PROPOSAL

Name: Precious Emmanuel
Phabricator: Kachiiee
Zullip: Kachiiee
Email: elizabethprecious7@gmail.com
Location: Imo, Nigeria
Time zone: UTC +1
Working hours: 8pm - 3am
Also Flexible

CONTRIBUTIONS MADE
•In micro-task T331199, I read and made some Hypothesis and brief summary anchoring each with snippets from the book in my own words.

•In micro-task T331200, I made a light systematic literature review on about 3-4 papers.

•In micro task T331207, I carried out a survey to learn more about how translation languages are chosen.

•In micro task T331202, I created a repository that was able to note changes made at every commit in a CSV containing a data set of city and temperature.

•In micro task T331201, the config file base name is used as the translation engine name. The approach followed in this context is using the pyYaml library in python, and writing the parser from scratch with python where modules were separated in different files to avoid leakage between modules.

Extension
With the idea and scope about the project, I will work on the research aspects and also the analysis aspect as it would help me make data driven decisions from gathered surveys and Hypothesis.
I would also try to know more on the model for article suggestions and how it relates the imbalances we see and how it can be counteracted. Our project will be focused on carrying out a survey and also analysis on the feedback, but all of this will be dependent on the approval of the WMF language Engineering team.

Below is a timeline on how I intend to achieve this

TIMELINE

Selection Period

April 4th - May 28th
•Make more research on the existing micro task in order to gain enough insights and knowledge on the translation imbalance we see.

•Study the hypothesis and summary made from different articles and see how we can counteract the limitations causing the imbalances

•Analyse the data gotten from the previous survey I carried out to know which area needs more concentration in improving the imbalances.

•Study the machine model used in the article suggestion to know how the imbalance can be solved as I noticed articles from core countries have more chances to in the suggestion than that of the peripheral countries.

INTERNSHIP PHASE
May29th - June 12th
Research and Preparation
•Community bonding
I will get to bond with other members of the community, to enable collaboration, open lines of communication and to fully understand the community's expectations and goals.

•Define the scope of the project, including specific types of article and target audience for the article suggestion
.
•Conduct a comprehensive and Extensive literature review on existing research translation imbalance.

•Write a blog post

June 13th - June 27th

•With the existing survey which was influenced by so many factors in our micro-task, will work with mentors to improve the questions in order to get useful information that would help carry out a problem solving analysis.

•Carry out surveys on both the peripheral and core languages in specific domain interest to avoid any of any such in our data. To get a good data of my survey will avoid Vague

• Discuss with WMF Language Engineering the possibility of sending out the survey to translators

•Gather my qualitative and quantitative data, and do some data cleaning to avoid incorrect results, Error or biassing algorithm.

•Write 2 blog post with possible topics
Challenges and opportunities for emerging researchers in language translation.

Understanding the historical roots of research imbalance in language translation: My encounter

June 28th - July 5th
•Identify the most promising tool to improve the efficiency of the model for article suggestions.

• Study the techniques for improving the article suggestion features and user experience.

July 5th - July 19th
•Identify languages that are underrepresented in the
current model

•Data collection and preparation
Collect data set of articles mostly translated as to see the reason behind the translation of a particular language

•Analyse the current model to identify which languages are underrepresented or not represented at all.

Write a blog post with topic:
Addressing the gender gap in language translation: My research.

July 20th - August 3rd
•Collaborate with other contributors to collect and clean data from various source

•Fine tune the model architecture and this would help improve its performance in suggesting relevant articles in underrepresented languages.

•Incorporate cross lingual knowledge transfer to help model learn data in other languages that may have quality data

•Evaluate the model's performance by suggesting articles in various languages and making necessary changes to improve the model's accuracy.

Working with other contributors to improve the models as all our findings and survey still boils back to effective and efficient models.

Write a blog post on :
The role of ethics in language translation research

August 4th - August 25th
PROJECT CONCLUSION

•Prepare my final report and presentation
•Submit my findings to the community members and also my mentors
•Get feedback from them
•Document my findings and experience so far.

Deliverables
Write a blog for my experience
Write a blog every two weeks on encounters
Keep in touch with mentors and community members.
Become and active member of the community and also give back to the community

How I heard about Outreachy
After my graduation a few months ago from college, I got into the tech space with so much learning, re-learning and unlearning. A friend of mine suggested that I apply to an open source as I have never had any experience with open source, so she recommended Outreachy and how they close the gender bias gap for women in open source projects and tech in general. I'm glad I gave it a try as I've learnt so much in this course of contributing to this project.

Past Experience with other Software.
I have never contributed to any open source in this course of my leanrning, but I've always been engaged in communities where I gain insights on my skill sets and resourceful engagements.

About Me
I'm Precious Emmanuel by name, a recent graduate from the department of Metallurgical Engineering. My childhood opened me up to technology and I've been eager to know more about what it entails. My curiosity awakened my passion to be in the tech space.
I'm a self learned Data scientist who is ready to pull through regardless of being a lady. In these few months of learning, I've been able to grow my python skill as a mode of analysis as it's very flexible and vast, and I also got my hands on a visualisation too(Tableau) .

In education generally, I'm a curious learner and ready to try out anything as long as it involves problem solving and making changes. I recognize that tech is an ever changing landscape, and the only way to stay relevant is to remain open to new ideas and approaches .Being able to contribute to this project was a thing for me as I was able to put my skill to test and also continuous learning cause I believe education not only broadens our perspective, buy also expands our mind as growth is a life journey.

Above all I'm committed to learning, un-learning, and re-learning, also making positive changes in my community and giving back to the world around me.

.