Profile
Name : Nathaly Toledo
Time zone : GMT -4:00
Email : nathaly12toledo@gmail.com
Zulip : Nathaly Toledo
Github : https://github.com/ahn-nath
Location : Venezuela
Blog: : https://medium.com/@nathaly12toledo
Synopsis
The project studies, proposes, and tests solutions for the translation imbalances perceived on Wikipedia and studied in the past. From the description provided by the mentors of the project, we have:
When we compare the number of translations made between pairs of languages, we find very high ratios of articles being translated from languages with a larger wiki presence into languages with a smaller presence. English alone is the source language for 70% of all published translations, and the pattern seems to repeat for other colonial tongues.
We would like to understand why this is. We've begun to find explanations in the software design choices, and there are many potential influences behind each translator's choice of article and languages. Some of these factors might be: the number of articles available in each language, cultural richness and blind spots, suggestions made by software, the availability and quality of machine translation, and more.
The Outreachy component of our project will follow one of these possible avenues for investigation.
Mentors: @awight
Co-mentors: @Simulo
My contributions
Contribution #1
- Link: https://docs.google.com/document/d/1lXfRC9kgPWGlpYPqpDgIH1kZUkAeVAzy7qOCe0PwUhc/edit?usp=sharing
- This is my solution for a task that required comparing the differences between an official API result and the scraper result from a GitHub contribution. I observed the differences after parsing and converting the data of both outputs and validated them with programming solutions. Finally, I compare all public contributions to observe their differences and how each file is compared to the others in terms of accuracy.
Contribution #2
- Link: https://etherpad.wikimedia.org/p/r.df3d6f2e35e02a3cfa8912b58abb6e36
- The goal of the survey is to learn more about how this software is used, and how translation languages are chosen. My approach to this contribution was attempting to address assumptions made after reading research papers about potential reasons that translation imbalances may be present in the Wikipedia community of translators. It also includes general questions that may serve as an alternative guide on how to proceed with further research and the construction of new assumptions.
Contribution #3
- Link: https://github.com/ahn-nath/configuration-evolution-over-time.time-machine
- This project is a time machine for CSV files. It allows you to track changes in CSV files over time, restore CSV files to a previous state, compare CSV files to a previous state, keep track of the last time a CSV file was changed, and update it accordingly, without having to rewrite the data each time. Essentially, it works as a parser, which reads the data into a native structure in memory and plays back the data repository's git history to parse the data at each commit, storing the entire sequence in memory along with the timestamp of the git commit. It uses the GitHub API to access the git history of the data repository. As of now, it uses the GitHub repository Configuration Evolution Over Time: Source File as the primary data source, but it can be easily extended to use other data sources.
Contribution #4
- Link: https://github.com/ahn-nath/wikimedia-cxserver-config-parser
- This is a simple parser for Wikimedia CSV files. It is designed to be used with the “https://github.com/wikimedia/mediawiki-services-cxserver/tree/master/config” directory. Essentially, it is a parser for these files and creates a single flat, in-memory structure with all the supported pairs. It exports the data in a list of accepted YAML files as a CSV of all pairs, with at least the following columns: “source language”, “target language”, “translation engine”, and “is preferred engine?”
The configuration files have several file structures. Most have the source as the top-level key, and target languages as a list of values under that key. Those with “handler” indicate a non-standard interpretation for the file. My solution parses the "mt-defaults.wikimedia.yaml" to get the designated files and then proceeds to generate a CSV file with all records. It includes tests, a pickle file to improve time performance, the main script, and various folders with the target data.
It uses:
- Python 3.6 or higher, pip and git.
Contribution #5
- https://docs.google.com/document/d/1k2Ek4V2kzmjBXc4cOb9EmVsgV0IR08TGkSo8edXc-Ng/edit?usp=sharing
- The task requires analyzing and summarizing a paper that studies participation on Wikipedia. It also involves making based assumptions and explaining them. I have read the research paper provided and added my own conclusions about the topics and about how it applies to the challenge we would target with this Outreachy project.
Timeline (2023)
Week 1 (30th May - 5th June)
- Get familiarized with the topic and discuss with mentors priorities.
- Solve task #1: Could be related to improving the integration of the cxserver scraper and the time machine as well as extending their functionality or the functionality of past tasks.
Week 2 (6th June - 12th June)
- Document (task #1).
- Get mentors' feedback
- Continue working on task #1.
- Submit the final version.
Week 3 (13th June - 19th June)
- Document new changes of task #1.
- Discuss with mentors potential software improvements of research imbalances as well as based proposals that we could test.
- Decide on three potential software solutions.
Week 4 (20th June - 26th June)
- Start working on proposed solution #1 - Translation imbalances visualizer enhancements: a tool that allows community contributors from around the world to visualize translation imbalances across regions on Wikipedia, and compare them with past statistics as well as outcasted statistics, so that those who are interested in solving it or making contributions based on that data, can do so with a clear statement backed by evidence. Some features I suggest adding would be country of origin (1), and filters by timeframe (2). This is based on two existing solutions: https://en.wikipedia.org/w/index.php?title=Wikipedia%3AEdits_by_project_and_country_of_origin, and https://en.wikipedia.org/wiki/Special:ContentTranslationStats.
- Submit the project plan (project requirements, motivations, and research to back the proposal, wireframes/user journeys, software architecture, and development plan) for feedback.
Week 5 (27th June - 3rd July)
- Implement feedback.
- Continue working on proposed solution #1 – Translation imbalances visualizer.
- Discuss with mentors the final version.
Week 6 (4th July - 10th July) + Week 7(11th July - 17th July)
- Complete proposed solution #1 - Translation imbalances visualizer.
Week 8 (18th July - 24th July)
- Start working on proposed solution #2: a set of two features to integrate into Wikipedia to help influence user behavior in a way that promotes a reduction of imbalance in translations.
- Submit the project plan (project requirements, motivations, and research to back the proposal, wireframes/user journeys, software architecture, and development plan) for feedback.
- Start blog post draft #1.
Week 9(25th July - 31st July)
- Implement feedback.
- Continue working on proposed solution #2: a set of two features to reduce imbalances.
- Discuss with mentors the final version.
- Complete and publish blog post #1.
Week 10(1st August - 7th August)
- Complete feature #1 of the proposed solution #2 – a set of two features to reduce imbalances.
- Start blog post draft #2.
Week 11(8th August - 14th August)
- Start working on feature #2 of the proposed solution #2 – a set of two features to reduce imbalances.
- Submit progress for feedback.
- Complete and publish blog post #2.
Week 12(15th August - 21st August)
- Implement feedback.
- Continue working on proposed solution #2 and the two features.
- Discuss with mentors the final version.
- Start blog post draft #3.
Week 13(22nd August – 30th August)
- Complete proposed solution #2.
- Complete and publish blog post #3.
30th August and later
- Final improvements.
- Additions to the documentation of projects.
- Continue working as a volunteer if improvements are needed for the project.
About me
I am Nathaly Toledo, a senior student from Caracas, Venezuela. I studied in a technical institute during my high school years, and continued my education at the University of the People, with a computer science degree. Next year I will be completing a Master's level diploma with Open Classrooms in Software Architecture. Additionally, I participate in different programs and take different specializations to upgrade my software engineering skills and understanding of world social problems, as well as social problems in South America and in my country. I have three years of working experience with international clients as a professional software developer and have been officially recognized as the top 3% of all professionals on a freelancing platform.
I have decided to apply to Outreachy because I feel it is a highly useful way of connecting with mentors and projects that can help me have a bigger understanding of quality software, the best approach to tackle problems, and help me upgrade my skills. Furthermore, I have decided to only apply to this Wikipedia project because I felt, after studying the other projects, that it was the only project that matched my interests, skills, and purpose. And I believe that this contribution can have the potential to improve participation and make knowledge more accessible to people in my country and in countries like mine. I would like to study the gap between countries when it comes to translation, the impact it has on the access certain groups have to relevant information, why we should care, and how to do something about it.
Past experience with this community
User:
Like most people, I have used Wikipedia to inform myself about relevant topics and have a starting point for important information. Sometimes, I just use Wikipedia to find significant or useful groups of references targeting one concept. I recently visited: https://es.wikipedia.org/wiki/Instituto_del_Hemisferio_Occidental_para_la_Cooperaci%C3%B3n_en_Seguridad
Similarly, I have direct contact with Wikimedia Venezuela because I am a member of the Impact Hub Caracas, which is associated with the organization, and can connect with them. I recently contacted Galahad (https://meta.wikimedia.org/wiki/User:Galahad), to have his views on the problem the project is trying to study.
Contributions:
Via Outreachy, I applied to a Wikimedia project before and worked on one contribution during the application period. This is the link to my work: https://public-paws.wmcloud.org/66093174/task-01.ipynb
Past experience with other communities
Contributions:
As mentioned later, I was an open-source developer through a fellowship. I worked for two months on several contributions to ProgramEquity, an open-source project used to promote climate change and help advocacy groups meet their goals.
Repository link: https://github.com/ProgramEquity/amplify
Users:
As a user of Windows, I am lately getting more and more involved with Linux. I have used Docker Web many times to interact with Ubuntu via virtual machines, and I was also selected by the Linux Foundation to receive the “Shubhra Kar Linux Foundation Training (LiFT) Scholarship 2022”, as listed on their public website (https://www.linuxfoundation.org/about/lift-scholarships). Thanks to it, I am training and getting involved with it professionally.
Naturally, as a developer, I am a user of many wonderful open-source projects like Python and many libraries. Nevertheless, I would like to focus on more specific cases where I was a user who took advantage of the tool in a technical way:
I used a tool developed by another open-source contributor for my open-source contributions, and I discussed with the author how to use it. I also analyzed and compared other alternatives to implement a solution.
Relevant links:
- https://github.com/arthurfiorette/axios-cache-interceptor/issues/383#issuecomment-1306427627
- https://github.com/ProgramEquity/amplify/issues/294#issuecomment-1421176609
Relevant projects
As a freelancer with several years of experience working for international clients and as someone with three months of experience with open-source development, I have worked on a wide range of products that involved the analysis of existing systems for their improvement, as well as detecting the root cause of issues that could be solved or addressed with software, as it’s the case with software imbalances. I would like to highlight one closed-source projects I worked on and one open-source project:
Open source project: Program Equity – Amplify
After being selected as an MLH Fellow last fall, I was chosen by the GitHub organization to contribute as an open-source developer in one of the projects they sponsor: ProgramEquity. The Amplify project is “an open-source app created for users to take the initiative in being part of an actionable step in the efforts to protect against climate change.” Besides that, they also help indigenous communities in North America by enabling advocacy groups through the app. I was the most active contributor and successfully closed more than five issues in two months.
Some of my contributions:
- Add social media icons with links to representative card: https://github.com/ProgramEquity/amplify/pull/361
- Fix the display of the filter in the campaign page form: https://github.com/ProgramEquity/amplify/pull/365
- Crop representatives photo: [Low Priority] Crop representative photos using Cicero API data: https://github.com/ProgramEquity/amplify/pull/367
- Cache option APIs: https://github.com/ProgramEquity/amplify/pull/373
- Support queries by address: https://github.com/ProgramEquity/amplify/pull/402
- [Documentation] Add ORM diagram to README: https://github.com/ProgramEquity/amplify/issues/354, https://github.com/ProgramEquity/amplify/wiki/Data-Structures/382ff30431f1a13375838e8dc89934de62252a17
Skills I gained:
- Collaborating in an open-source environment.
- Proposing ideas and discussing them with an open source community before implementing a solution.
- Assertive communication and collaboration while pair programming with senior developers.
- API caching and understanding techniques about performance improvement.
Relevant links:
- ProgramEquity - Amplify: https://github.com/ProgramEquity/amplify
- GitHub blog highlighting my participation: https://github.blog/2022-09-23-meet-the-github-campus-experts-selected-for-the-fall-2022-mlh-fellowship-cohort-powered-by-github/
Closed source project: Queensbury.io
For one year, I worked on the creation of a minimum viable product (MVP) and a proof of concept (PoC), for a startup with a focus on the US. As a junior software engineer, I was responsible for Queeensbury.io, one of Cryptius's projects. Queensbury.io is a system that aims to revolutionize the boxing world, by providing an accurate way to analyze and score boxers’ performance via artificial intelligence and body mechanics.
Skills and knowledge I gained:
Overall, I designed and built the proof of concept and MVP versions of the solution. I have learned how to:
- Used different communications protocols, like TCP to understand and implement a WebSockets architecture that let the team create a notifications system that was event-driven and highly functional.
- Define and study the software architecture to use.
- Lead the initial design process and handle the low-fidelity and high-fidelity proposals.
- Fully implement each module of the web app with Django and Plotly.
- Integrate the Vimeo and Google Cloud Storage APIs.
- Help with the data pipeline process, the creation of Google Cloud Functions, and the Docker container logic.
- Use data analysis skills to process inputs and outputs and simplify existing calculations.
- Generate Plotly diagrams/graphs with the data received.
- Document the project on Notion and GitHub, by adding the architecture overview, technical specifications, product requirements, testing specifications, maintenance guide, and README file.
Relevant links:
Unfortunately, due to the nature of my freelancing contract, I cannot share many substantial details about the project. However, I can share a Gist secret that contains some files that the client allowed me to share and that date back to the initial stage of the project:
- Gist secret with some files that describe my involvement with the project during the initial stages: https://gist.github.com/ahn-nath/34b559ca0648f577fe46c73e09e60105
- Extract feedback from the client. This feedback is listed as a recommendation on LinkedIn: https://www.linkedin.com/in/nathaly-toledo/
“With so many positive things to say about Nathaly, it's impossible to know where to begin. We are a startup company, and as such we are very much “sink or swim”, and we constantly challenge our people to think creatively to solve complex problems with no easy or readily identifiable solution. In this challenging atmosphere, I have yet to see a challenge that she was unwilling to take on or unable to solve. Nathaly is an incredibly talented engineer. She designed and built our platform from the ground up, often with little or no guidance on which direction would be the best to take. She basically started with an idea that was not her own, some very loose guidelines on what the end product would look like, and hit the ground running...”
Time commitments on initial application
As a student of an online institution, I have complete control of my schedule and can easily adapt my commitments to the project and the availability required. My current time commitments with the university need up to 15 h of dedication, and, on average 10 h per week.