=Proposal for https://phabricator.wikimedia.org/T328597==**Profile Information**=
**Name: Abhishek Bhardwaj** Margaret E. Okoronkwo
Email: abhishek02bhardwaj.er@gmail.com**Time Zone:** UTC +1
User Page: [[ https://commons.wikimedia.org/wiki/User:Abhishek02bhardwaj | Wikimedia Page ]]**Zulip:** Margaret Okoronkwo
Location: Delhi, India
LinkedIn: [[ https://www.linkedin.com/in/abhishek-bhardwaj-821054241/ | Abhishek Bhardwaj ]]
Zulip: abhishek02bhardwaj.er@gmail.com (Abhishek Bhardwaj)
Phabrikator: Abhishek02bhardwaj
Time Zone: UTC+5:30
Working Hours: 9 AM to 2 AM (UTC+05:30)
On College Days**Phabricator:** Ebere.O
Location: Birmingham, occupied between 9 AM to 2 PM (UTC+05:30)UK
**Working Hours:** 9AM-5PM
=**Abstract**=
Wikipedia is a free, online encyclopedia that provides information on a vast array of topics, from history to technology to pop culture. It is maintained by a global community of volunteer editors who collaborate to create and update its content. With millions of articles available in multiple languages, Wikipedia has become a go-to resource for people seeking knowledge and information on a wide range of subjects.
Wikipedia provides translation services that allow users to access articles in their preferred language. This feature enables content to be available to a wider audience, and it is achieved through the efforts of volunteer translators who work to create and update articles in various languages. The translation services also include machine translation, which uses artificial intelligence to automatically translate articles into different languages. While the quality of machine translation can vary, it has made it easier for users to access information in their native language, regardless of the original language of the article.
But one important issue that concerns the wiki enthusiasts is the imbalance in the translations .By comparing the number of translations made between pairs of languages, it is observed that articles from languages with a larger presence on Wiki are being translated into languages with a smaller presence at very high rates. English, specifically, is the source language for 70% of all published translations, and this trend is also present for other colonial languages. [[ https://phabricator.wikimedia.org/T328597 | This project ]] will focus on researching into these imbalances and understanding the reasons behind the same.
=**Extension**=
Apart from the Analysis stream of research I will also be working in the UX research direction, taking interviews of the translators to gain an understanding of how their perception of language importance influences their language selection. Also, I will try to investigate how the software design impacts the selection of languages and the translation workflow.
=**Mentors**=
@awight and @Simulo
=**Experience and Contributions made to the project**=
Being a Wikipedia user since the age of 12 I always wondered who was the person who was so knowledgeable to write all of this information all by themselves (as a kid I thought Wikipedia was written by one person like a book). As i grew old and realised it wasn't an individual but a community who did this, I never thought in my wildest dreams that one day I will be sitting in front of my laptop being capable enough to write a proposal to those people to be a part of their team, this experience is more important to me than all of the knowledge that I have gained participating in this contribution period for the Wikimedia foundation. So it is already a dream come true for me.
During the past 21 days (from 6th of March to 26th of March) I have learned a lot of things doing this project,
1. In the task [[ https://phabricator.wikimedia.org/T331199 | #T331199 ]] I summarised a paper and on the basis of it gave hypotheses and informed guesses about how it applies to translators.
[[ https://github.com/Abhishek02bhardwaj/Submission-for-T331199-and-T331200 | My Contribution ]]
2. In the task [[ https://phabricator.wikimedia.org/T331200 | #T331200 ]] I did a light systematic review of literature that might be relevant to our research.
[[ https://github.com/Abhishek02bhardwaj/Submission-for-T331199-and-T331200 | My Contribution ]]
3. In the task [[ https://phabricator.wikimedia.org/T331201 | #T331201 ]] I created a parser from scratch to extract the cx-server configuration and extract them to a csv.
[[ https://github.com/Abhishek02bhardwaj/Extract-cxserver-configuration-and-export-to-CSV | My Contribution ]]
4. In the task [[ https://phabricator.wikimedia.org/T331202 | #T331202 ]] I created a time machine to access the git history of a data repository and analyzing the data at each commit. The information obtained is then stored in the memoryfree and open online encyclopedia that serves the world, along with the time stamp of the git commit,this means that the need for consistent update and contents cannot be overemphasised as this is the secret of preserving the relevance of the “online-go-to-place”. forming a complete sequence.
[[ https://github.com/Abhishek02bhardwaj/Evolution-Tracker | My Contribution ]]There is the quest to find out the gap between Wikipedia and translations and how this affects the users and the service to the world as a whole.
5. In the task [[ https://phabricator.wikimedia.org/T331204 | #T331204 ]] plotted flow diagrams illustrating translation imbalances.
[[ https://github.com/Abhishek02bhardwaj/Flow-Diagrams-Illustrating-Translation-Imbalances | My Contribution ]]
6. In the task [[ https://phabricator.wikimedia.org/T331207 | #T331207 ]] I learned about how to compose a survey. In this task I drafted a survey for Content Translation software userDocumentations on what this project is about and the efforts that are being put in place has to be made in the process, investigating how the software is used and howand this is where my contributions to the languages are chosenprojects come in.
[[ https://etherpad.wikimedia.org/p/xGzVywcafj65F66Gea2n | My Contribution ]]
7. In the task [[ https://phabricator.wikimedia.org/T332643 | #T332643 ]] we had to integrate the configuration scrapper that we built in task [[ https://phabricator.wikimedia.org/T331201 | #T331201 ]] and the time machine built in task [[ https://phabricator.wikimedia.org/T331202 | #T331202 ]] running the configuration scraper on every git commit of the cxserver source repository.
[[ https://github.com/Abhishek02bhardwaj/Rough-Integration-of-Time-Machine-and-Configuration-Scrapper | My Contribution ]]
8. In the task [[ https://phabricator.wikimedia.org/T332647 | #T332647 ]] I compared the API results to the output of the scrapper I built in the task #T331201. The accuracy of the scrapper is 100%. As an extension to the task I also compared other contributor's output and recorded their match percentage.
[[ https://github.com/Abhishek02bhardwaj/Compare-config-scraper-output-with-config-API | My Contribution ]]
=**Past Experience with Open Source Software **=
As a contributor, this is my first time contributing to open-source, but I have been an active open source user since past 10 years. From using VLC Media Player to watch videos, to using the Android operating system on the smartphones, open-source softwares have been an integral part of the technology present in my life. I started to learn coding on Dev Cpp which is a free open source IDE for Windows. Then, when I expanded my horizons of learning and learned more programming languages I switched to an IDE that is compatible with multiple languages, The Visual Studio Code which is again built on open source. I use Firefox to browse internet which provides phishing and malware protection. I use WordPress to develop websites which are really good in design and function very well. I use PHP to mange dynamic content and session tracking. MySQL is my favourite RDBMS for managing databases. In this way whether it was entertainmentGiven that the project is open source and broad, learning or any other utility open source has helped me a lot by providing excellent utilitiesthe team responsible for carrying out this project needs to increase and my participation is geared towards achieving this.
=**About the Project**==**Mentor(s)**=
Among the proposed streams I am quite interested in the UX research and the Analysis as my past projects involved product development in which UX research and Analysis is an integral part. Through the UX research I aim to dig deeper into the thought process of translators while selecting languages for translation and what are the factors that motivate and prompt them into choosing the language. Can these factors be countered from the project end or are these some other factors which needs to be addressed with a better approach. Through the Analysis stream, I will be analysing the possible technical reasons for these imbalances and also try to find a way to counter them.
I have divided my timeline into 5 phases, each phase addresses one important aspect of research into the translation imbalances.
=**Timeline**=@awight and @Simulo
===Pre-Selection Period===
**April 4th - May 3rd**
- Study the code responsible for setting the default languages from [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/ContentTranslation/%2B/refs/heads/master/modules/dashboard/ext.cx.dashboard.js | CXDashboard.findValidDefaultLanguagePair ]].
- Carry forward Task [[ https://phabricator.wikimedia.org/T331204 | #T331204 ]] further producing and illustrating translation imbalances via flow diagrams.
- Gather information about the two algorithms at play: one chooses the pair of source and target languages between which to translate, and the other chooses which articles to show for translation.
**May 4th**
- Celebrations
===Community Bonding Period==**Contributions made to the project and Experience gained**=
**May 5th - May 28th**I was delighted to receive the email that confirmed my approval to contribute to the Wikimedia project. It was really encouraging. I got the chance to interact with the community and participate and also saw opportunities to learn and improve my skillset. My contributions are as follows:
- Refine the survey developed in task [[ https://phabricator.wikimedia.org/T331207 | #T331207 ]] with the help of mentors and the already provided contributions on the taskIn the first microtask https://phabricator.wikimedia.org/T331199, we were given the task to summarize the paper " Digital Division of Labor and Informational Magnetism: Mapping Participation in Wikipedia translations " giving informed guesses and Hypotheses about how it applies to translators.
- Send out the survey to translators (Contingent on discussion with WMF Language Engineering team) .
- Find information about potential candidates for interview.
- Prepare and discuss a questionnaire with the mentors that will be used for interviewing.
- Communicate to the potential candidates and ask for a suitable date and time for their interview.
- Review the papers that were listed in the task [[ https://phabricator.wikimedia.org/T331200 | #T331200 ]] ultralight systematic review and try to observe if there is anything we can relate to or use in our research.
===Contribution Period Begins===
**May 29th - June 2nd**
**WEEK 1**
- Review the papers that were listed in the task [[ https://phabricator.wikimedia.org/T331200 | #T331200 ]] ultralight systematic review and try to observe if there is anything we can relate to or use in our research.Link to my contribution- https://github.com/OkoronkwoMargaret1/Outreachy-Internship-Contribution-1
**June 5**
**FEEDBACK #1**In the second microtask https://phabricator.wikimedia.org/T331200, we were asked to give a literature review showing how it relates to the Translational Imbalances by reading through the results of a search like " Wikipedia translations ".
===Phase I Testing of the Possible Hypothesis===Link to my contribution- https://github.com/OkoronkwoMargaret1/-Ultralight-systematic-literature-review
=**The Project**=
**June 5th - June 9th + June 12th - June 16th**The project title is Wikimedia - Research imbalances in translation between languages on Wikipedia. This project is still ongoing as there are different areas of interest that needs to be worked on.
**WEEK 2 + Week 3**In order to give my best contribution to the project, I will be concentrating on my key areas of strength and passion- writing and researching. Hence, my contribution during the internship being projected as follows:
- Hypothesis 1 – The distribution of translators around the world will be significantly different from that of the editors and contributors. Analyse the data of translators on the basis of location. In a way similar to what was done for the paper [[ https://www.tandfonline.com/doi/full/10.1080/00045608.2015.1072791 | Graham, Straumann, Hogan. 2015. “Digital Divisions of Labor and Informational Magnetism: Mapping Participation in Wikipedia.” ]]**Project Timeline**
**June 19th - June 23rd + June 26
April 4th - June 30th**May 4th
**WEEK 4 + WEEK 5**• Study the materials for the Translational Imbalances.
- Hypothesis 2 - When two languages are more closely related geographically or historically, translations are more likely to occur. So languages which are more linked to other languages geographically or historically will see more translations.
**July 3rd**
**FEEDBACK #2**May 5th-May 28th
===Phase II Interview and Survey Documentation===
**July 3rd - July 7th + July 10th - July 14th**• Bonding with the community members.
**WEEK 6 + WEEK 7**• Study previous User Research based on Translational Imbalance.
- Conduct interviews with translators to gain an understanding of how their perception of language importance influences their language selection. Also, investigate how software design impacts the selection of languages and the translation workflow.May 29th-June 5th
- Analyse the responses from the survey and deduce important observations and document them appropriately.
**July 17th - July 21st**Week 1**
**WEEK 8**
• Gather information about individuals and users to be interviewed.
- Document the interviews and the findings into a structured blog that can be used for further reference in• Draft out survey with the help of the research workMentors.
**July 24th**June 6th-June 13th
**FEEDBACK #3****Week 2
**
===Phase III Analysis and Improvement of Algorithm===
**July 24th - July 28th + July 31st - August 4**• Send out drafted survey to translators.
**WEEK 9 + WEEK 10 **•Prepare questionnaire that will be used for interviewing
- Analyse and research into the algorithm used for suggesting the articles for translation. Look for any potential bias and devise a way to remove if any.June 14th-June 21st
===Phase IV Build a Quantitative View on the Issue===**Week 3**
**August 7th - August 11th**
• Gather and find information about candidates to be interviewed.
**WEEK 11**•Ask for an appropriate date and time for the interview from the candidates to be interviewed.
- Complete the Configuration Time Machine work with the required documentation.June 22nd-June 30th
**August 14th - August 18th****Week 4**
**WEEK 12**
•Collate findings from the Survey question filled
- Figure out the discontinuities in the machine translation and check for correlations with the step changes in the published translations.July 1st-July 7th
**August 21st****Week 5**
**FEEDBACK #4**
•Collate findings from the interviews conducted.
===Phase V Conclusion===July 5th-July 15th
**August 21st - August 25th****Week 6**
**WEEK 13**
•Get feedback from the Mentors and use it for better and improved UX Research.
- Conclude the research work, prepare a report of the findings, and publish the raw data links, code, graphs and exceptions.July 16th-July 23rd
- Write a blog article containing a concise report of all the work done that can be used to carry the research forward**Week 7**
**August 28th and Later**
• Work with teams and community volunteers to obtain an in-depth understanding of documentation needs and requirements
July 24th-July 31st
- Celebrations.**Week 8**
- Continue code-based contributions to Wikimedia
• Produce high-quality and easy-to-understand documentation for a variety of engineering processes, API documentation, FAQs, user guides, standard operating procedures.
- Be an active member of the Wikimedia community and start exploring other communities to work with.August 1st-August 7th
- Actively maintain the code and documentation and guide beginners who are interested in contributing.
**Stretch Goals****Week 9**
- Test the hypothesis about how some wikis are more challenging to contribute to than others (may be because of the cultural or stylistic difference of wikis ) either through survey questions or by looking at translator activity over time.
==Other Deliverables during the Internship==
Create clear, accurate and updated documentation for the project
- Weekly Blog posts on my internship progress/experience.August 8th-August 15th
- Blog posts about my experience with the open source community and the Wikimedia Foundation.**Week 10
**
- Regular communication with my mentors and other members at the Wikimedia Community• Conduct interviews with translators to get a perception of how their understanding of language importance affects their language selection.
==About Me==August 16th-August 23rd
I am a sophomore, pursuing a degree in Bachelors of Technology in Information Technology and Mathematical Innovation from Cluster Innovation Centre, University of Delhi. I am currently in my 4th semester of the 8 semester program. I will be graduating in May, 2025. I am an active member of the coding society of our college where we have built an ecosystem of peer-learning and have also helped in organising numerous workshop and technical fests.**Week 11
**
==Past Projects==• Document the interviews and the findings to be used for further research work.
1. **[[ http://dhamni-cic.infinityfreeapp.com/?i=2 | DHAMNI ]]** :August 24th-
- It is web-development project which aims to provide a platform for blood donors and recipients to share information.**Week 12**
- The donors can upload their details which can be searched by people who are in need of blood donation
• Conclude research work.
• Continue contributions to Wikipedia.
• Become an active member of the Wikimedia's Community.
2. **[[ https://curiocic.netlify.app/ | Curio ]]** :
- It is also a web-development project in development and aims to solve the language gap (in available audio) problem of YouTube.
- Educational content is difficult to understand through subtitles due to which to which dubs and audio translations become a need.=**Deliverables**=
- Through this platform user can record their dub of a YouTube video and upload it that can be viewed by other users who wish to watch it in the languag• Regular communication with my Mentors and Other Wikimedia community members.
• Blog posts about my experience with the Wikimedia open source community.
• Blogging about my progress during the internship and experience.
3. **[[ https://github.com/Abhishek02bhardwaj/Facial-Recognition-Using-Principal-Component-Analysis | Facial Recognition Software using Principal Component Analysis]] **-
- Developed a MATLAB software capable of identifying and retrieving an image if it exists in its train database.
4. **[[ https://github.com/Abhishek02bhardwaj/mazes-with-dijkstra-and-A- | Maze Solving Algorithm Comparison ]] **:
- Compared Djikstra and A* algorithms along with their time and space complexity
==How did I learn about Outreachy?==**About Me**=
In our college am a Human Resources Professional embracing tech and gradually learning tech skills. I got married last year in Nigeria and had to move to a new country where I now stay with my husband- Birmingham, we have a really good culture of contributing to open source and open source is something that is discussed around our campus almost all of the timeUK. This culture got my initial interest in open source software and I starI’ve always wanted to be a tech savvy but didn’t get the opportunity due to various factors that militated learning about them.against it.
However, As I explored morI enrolled to learn about technical writing because it gives me the chance to write about the communities,ech while learning it. I realised what kind of an impact they make into people's lives which further motivated me to be a part of this community and start my part of contributions.Now, I came to know about Outreachy from a college seniorI am learning data analytics, who interned at Outreachy and contributed to Inkscape in 2021.I like data analytics because it involves research, She told me about the program and also about how it supports diversity in free and open source software which got my interest and I started to work in this directionproblem solving and it rewards my curiosity and quest for knowledge.
My participation in the Wikimedia project for the Outreachy Internship will give me the platform to participate while learning about open source.