Abhishek02bhardwaj (Abhishek Bhardwaj)
User

Projects

Outreachy 26: Translation Imbalances
Component

Calendar

User Details

User Since: Mar 7 2023, 7:35 AM (59 w, 2 d)
Availability: Available
LDAP User: Unknown
MediaWiki User: Abhishek02bhardwaj [ Global Accounts ]

Recent Activity
View All

Sep 6 2023

Abhishek02bhardwaj added a comment to T345108: Outreachy Round 27: Improve documentation of Programs & Events Dashboard.

In T345108#9128478, @Ragesoss wrote:

Thanks @Maryann-Onyinye!

I fixed that truncated portion of the project description.

I will recruit a second mentor.

The core tasks involved would be to a) identifying high-priority documentation topics for improvement, and b) improving those via improving the text-based documentation and/or creating video-based resources. Proposing a more specific scope for three months of work in that area would be the task of an applicant developing a strong Outreachy proposal.

Sep 6 2023, 1:53 PM · Outreach-Programs-Projects, Outreachy (Round 27)

Aug 30 2023

Abhishek02bhardwaj closed T340759: Port wmfdata mariadb adapter to work on the public replica databases as Resolved.

Aug 30 2023, 7:43 AM · Outreachy 26: Translation Imbalances

Jun 30 2023

Abhishek02bhardwaj moved T340759: Port wmfdata mariadb adapter to work on the public replica databases from Backlog to Active on the Outreachy 26: Translation Imbalances board.

Jun 30 2023, 7:24 AM · Outreachy 26: Translation Imbalances

Abhishek02bhardwaj moved T340758: Create a shared data repository for the translation research project from Active to Backlog on the Outreachy 26: Translation Imbalances board.

Jun 30 2023, 7:24 AM · Outreachy 26: Translation Imbalances

Abhishek02bhardwaj moved T340758: Create a shared data repository for the translation research project from Backlog to Active on the Outreachy 26: Translation Imbalances board.

Jun 30 2023, 7:23 AM · Outreachy 26: Translation Imbalances

Abhishek02bhardwaj claimed T340759: Port wmfdata mariadb adapter to work on the public replica databases.

Jun 30 2023, 7:22 AM · Outreachy 26: Translation Imbalances

May 28 2023

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

May 28 2023, 6:16 PM · Outreach-Programs-Projects, Outreachy (Round 26)

May 15 2023

Abhishek02bhardwaj added a comment to T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

@awight and @Simulo Hi, I hope you are doing great. Firstly thank you for showing trust in me, I will give my best. So as @awight already have seen my proposal for review, I wanted to ask how should I refine it and the scope of discussing a more specific timeline for the internship. Of the tasks that I mentioned in my pre-contribution period, I am a bit slower than I expected I would be in reviewing the papers since I am having my semester end exams but I'll cover as many papers as I can before the beginning. Also it would be a great help if you could help me connect with Kavitha A the other mentor for the project.

May 15 2023, 3:55 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Apr 15 2023

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

In T331204#8776457, @awight wrote:

@Abhishek02bhardwaj Thanks--just to mention, the official contribution period is over, but I'm still interested in chatting about these diagrams if you wish!

The scatterplots are looking great! Just removing the top 10 languages is a valid approach and it seems to have worked. You might also try a log-log scale which (I think?) preserves the ratio but should give more detail on the low end of the scale.

If possible, I would imagine the X and Y axes should use the same scale. That might mean forcing a square aspect ratio for the diagram, rather than letting the plotting package pick the maximum extent based on the data. It makes sense that the missing 10 languages are heavy "translate from" sources, because eyeballing these graphs we know there will be the same sum for total translations from all languages and total translations to all languages, but the diagram shows mostly translations "to".

Apr 15 2023, 5:49 PM · Outreachy (Round 26), Outreach-Programs-Projects

Apr 7 2023

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

@awight I plotted another scatter plot but this time it is from the public content translation data source. For this scatter plot also I first prepared a csv with the total count of translated_from and translated_to for each of the 326 languages. From this csv we can see that there are 183 languages that have 15 or less than 15 articles translated from them (more than 56%). Also the most number of articles are translated from English and the most number of articles are translated to Es.

translation_ratio_on_the_basis_of_article_count_all_languages.png (612×1 px, 35 KB)

Apr 7 2023, 6:52 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

Oh! I forgot to add the figure and repository link in the previous comment. Here they are:

translation_availability_scatter_plot.png (612×1 px, 38 KB)

Repository Link

Apr 7 2023, 6:31 PM · Outreachy (Round 26), Outreach-Programs-Projects

Apr 6 2023

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

Hi, @awight! Another update of the flow diagrams I am working on. Actually I made a really silly mistake in my previous work. So firstly I would like to mention what I am trying to do because I feel I am trying to explore a little different aspect of this task. So in this task we were supposed to use the data that was already there as the content translation data source in my previous sankey diagrams i used this data to achieve the task and it showed promising results but soon I was intrigued by the what if instead of using the data about number of articles already translated I try to use the configuration data and find out the available possible options for translations. I will try to elaborate what I am trying to do to the best of my ability here. So we know that the current system of translations does not provide the availability of translation service from every language to every language. I am trying to find more about the languages in which the translation availability is not that much (actually very less). The above scatter plot shows us the same thing.
What I did not think about in the previous part? So earlier in order to find out about the availability of translation option to and from a language I was using the configuration scrapper data. In that data I was counting that how many times a unique language (In either the source or target language) appears in the target language or the source language column. This got me values higher than 159 (which is the number of unique languages in the configuration scrapper output). So even though the scatter plot I was getting was dividing the languages in groups but the numbers were not correct.
What I have done now? Now to make sure that all the unique languages are there in the list I first created a csv and got all the unique languages in it then from those languages I counted that how many times uniquely each language appears in the target language column given that the language is the source language and saved this number as translation out (representing outwards flow of translation from that language) and for the translation in I counted how many times uniquely each language appears in the source language column given that the language is the target language and saved this number as translation in (representing inwards flow of translation from that language).
From this I got a csv which can give the translation feasibility of each language. I think while trying to research into the imbalances in translation specifically in machine translation the most basic thing we need to know is that if there are tools available to translate that language. For instance 15 languages have less than or equal to 5 options for target language. 22 languages have less than 100 options for target languages. 73 languages have 134 or less options for target language. When the total languages that are explored in machine translation are 159. And the language with maximum number of available options in target language is es with 145 language which means it can't be translated to 14 languages. And so on.

Apr 6 2023, 6:27 PM · Outreachy (Round 26), Outreach-Programs-Projects

Apr 4 2023

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

Also I made a scatter plot of wikis and their article counts. Again it looks incomprehensible in the image but if you run the script you can zoom and have a better look.

the script that generates this scatter plot is "Scatter_plot_wiki_wiki_vs_Article_count"

Apr 4 2023, 7:26 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

In T331204#8748409, @awight wrote:

I was imagining there might be another, simpler way to get an overview. A scatterplot with "number of translations in" on one axis and "number of translations out" on the other axis should show both the relative volume and the overall translation "ratio" (ie. out vs in) for each language.

Apr 4 2023, 7:18 AM · Outreachy (Round 26), Outreach-Programs-Projects

Apr 3 2023

Abhishek02bhardwaj renamed T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ from Proposal - Research into translation imbalances - Outreachy - 26 to Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 2:58 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 1:29 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 1:23 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj claimed T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 12:55 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 12:53 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj added a comment to T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 3 2023, 12:49 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Apr 2 2023

Abhishek02bhardwaj added a comment to T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

@awight and @Simulo I made a few changes to the timeline. I will be looking forward to your review.

Apr 2 2023, 12:38 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 2 2023, 12:34 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Apr 2 2023, 12:27 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj added a comment to T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

In T333097#8748491, @awight wrote:

Thank you for the proposal!
Similarly, I'm not even sure we can send out a study, this would be the domain of the team actively building Content Translation (which I'm not a part of), so perhaps mark that as "contingent on discussion with WMF Language Engineering". Here are links to the project and team, https://www.mediawiki.org/wiki/Content_translation and https://www.mediawiki.org/wiki/Wikimedia_Language_engineering

Apr 2 2023, 11:49 AM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

In T331204#8747694, @awight wrote:

Exciting to see the alternative Sankey view also—I think the chord diagram communicates some of the important points, but there's definitely space for improving the visualization, especially in this direction of a different layout which might capture some of the cascading and hierarchical artifacts.

Apr 2 2023, 7:02 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

In T331204#8747694, @awight wrote:

These diagrams look very promising! If you increase the width and height, maybe more details will become visible? I see that the diagrams are hard to make sense of when languages appear in a random order, do you want to play with sorting the inputs somehow, for example sorting by number of translations published in the language?

Apr 2 2023, 6:50 AM · Outreachy (Round 26), Outreach-Programs-Projects

Apr 1 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8742704, @Kachiiee wrote:

@awight and @Simulo a review and feedback will really be appreciated as I had to make sure I participate in this particular task. This is a link to my repo

https://github.com/Kachiiee/outreachy_cxserver_config_extraction.git

Apr 1 2023, 3:10 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8747528, @LeilaKaltouma wrote:

Thank you @Abhishek02bhardwaj

Apr 1 2023, 9:37 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8747458, @Kachiiee wrote:

@Abhishek02bhardwaj Thanks for the concise feed back as I've been stuck for a while now

Apr 1 2023, 8:45 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8747454, @Mehak001 wrote:

My Contribution as a part of Outreachy Contribution Phase 2023:

https://docs.google.com/document/d/1Q0ilTQs13A9Ct7EFe_zmLu5H4I1adI6aHzSascOA-f8/edit?usp=sharing

This is the link to tthe paper summary. All feedback is welcome!

Apr 1 2023, 8:43 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8747344, @LeilaKaltouma wrote:

Hello @Simulo @awight, I am a little confused with the "Outreachy internship project timeline" since no information has been given. It states that we need to work with mentors to define this timeline. Could you please provide some information? Thanks!

Apr 1 2023, 8:13 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8747085, @Kachiiee wrote:

@Abhishek02bhardwaj please regarding the final application and Timeline, do we get an approval of our timeline from our mentors before submitting

Apr 1 2023, 7:26 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 31 2023

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8744745, @ruti198 wrote:

Good day.. Please can i still contribute to this project?

Mar 31 2023, 7:03 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 31 2023, 6:57 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 31 2023, 5:23 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 31 2023, 4:35 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 31 2023, 4:21 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Mar 30 2023

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 30 2023, 9:07 AM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 30 2023, 8:46 AM · Outreach-Programs-Projects, Outreachy (Round 26)

Mar 27 2023

Abhishek02bhardwaj updated the task description for T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 27 2023, 6:09 AM · Outreach-Programs-Projects, Outreachy (Round 26)

Mar 26 2023

Abhishek02bhardwaj added a comment to T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Hello, @awight , @srishakatux and @Simulo. I have drafted a proposal for the project Research into translation imbalances. I wanted to ask for a feedback or review if possible. Please have a look and let me know about any changes you would like to suggest.

Mar 26 2023, 7:48 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj created T333097: Proposal - Research into translation imbalances - Outreachy - 26 - ABHISHEK BHARDWAJ.

Mar 26 2023, 7:43 PM · Outreach-Programs-Projects, Outreachy (Round 26)

Abhishek02bhardwaj added a comment to T331200: Ultralight systematic literature review.

In T331200#8727203, @awight wrote:

@Abhishek02bhardwaj

Wikipedia Culture Gap (2019)

Yes, this is an exciting outcome! I agree that it's very relevant to our questions here as well, because it shows that there's a huge pool of articles waiting to be translated between any pair of languages.

Whether and how to steer translators towards this content are still open questions, of course.

Cross-lingual knowledge (2012)

The official abstract is a good starting point, but the idea of this task is to summarize in our original words, with a focus on how the paper applies to the translation questions. The number of daily pageviews in 2012 for example is not relevant for our project. I mention this now because I'm actually very curious to hear thoughts about how the paper can be applied, it's a great point that the main wiki-like repository for Chinese language is *not* Wikipedia, so translation between the two sources is not directly possible. This also suggests that there are certain languages that we expect to be disproportionately underrepresented in our language pairs. Chinese language has many readers but few Wikipedia editors.

Here's a link we can share to the full article, https://www.academia.edu/download/30725333/p459.pdf

Why the World Reads Wikipedia (2019)

Again, it would be better to engage with the paper by using it to adjust the conceptual framework of our study. The connection feels a bit tenuous since they're surveying readers here, but as you hinted at there might be something we can see when correlating different reader motivations with translation activity. For example, "media" readers might not have a strong motivation to translate (because media is often in a linguistic silo) but "current event" readers might feel a need to translate.

Mar 26 2023, 3:37 PM · Internet-Archive, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8726402, @Theodorahmbedzi8 wrote:

i wanted to take on a task ho do i do that

Mar 26 2023, 7:30 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 24 2023

Abhishek02bhardwaj added a comment to T332643: Rough integration of time machine and configuration scraper.

Hello @awight and @Simulo, here is my updated submission for this task. I have also uploaded the updated output file in the repository (Thanks to GitHub LFS). I am looking forward to your views and reviews. Thanks in advance.
GitHub Repository.

Mar 24 2023, 10:16 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 23 2023

Abhishek02bhardwaj added a comment to T332643: Rough integration of time machine and configuration scraper.

Hello @awight and @Simulo, here is my submission for this task. I would be very happy to have your views over it.
GitHub Repository.
Here is the link for my file. It was too big to be uploaded on github, so I am uploading it to Google drive and sharing the link as I figure out how to use GitHub Large File Storage.
Google Drive Link

Mar 23 2023, 2:33 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T332643: Rough integration of time machine and configuration scraper.

@awight So I did the integration of the two programs and I gave it a test run. The output file is giving the timestamps of each commit. The only thing is that the output file I am getting is a bit too big (727MB) which is not unexpected I think since the original CSV file had 28k lines and there have been quite a few commits on the repository since 2017 (starting from 16-01-2017). I wanted to ask that how should I upload it because there is a size limit for upload files on both GitHub and Phabrikator. Can you please suggest me something for that.

Mar 23 2023, 11:45 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T332643: Rough integration of time machine and configuration scraper.

Hello, @awight. I am having a little bit of problem accessing the cxserver repository. On visiting the link it is showing me "Not Found". I think due to the same reason I am getting an error on executing my python code which gives me an error saying "fatal: path 'config/' exists on disk, but not in '43da799d1b35c4ace5704869e4031784c195c4ed'". Is there something wrong with the link or is it on my local system only.

Mar 23 2023, 9:40 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331199: Read paper and make guesses about how it applies to translators.

Hello, @awight Thank you very much for your valuable feedback.

This is relevant to our project, and has the same sign as the effect we're seeing. My understanding of the article is also that people in the so-called periphery are committing to English- and French-language Wikipedias proportionally more than to regional language wikis—but this effect if extrapolated to translation would have the opposite sign: this would suggest that translators might tend to translate into former colonial languages.

I got a very similar result to this while doing the task #T331204 where we had to produce flow diagrams illustrating translation imbalances. One important noteworthy thing that I observed while producing those diagrams was that when I removed all the rows with English as source language, the largest number of lines were still pointing to English. Which I believe means that English is also the most popular target language among the other languages. And when I removed English from both the target language and source language, ru (which I think is the short for Russian) was the most popular language along with fr and es, which again gives a strong back to our hypothesis.

Mar 23 2023, 7:04 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 22 2023

Abhishek02bhardwaj added a comment to T332647: Compare config scraper output with config API.

Hello @awight and @Simulo, Here is my code repo for this task. As @Ahn-nath mentioned in her analysis/ comparison, the output CSV file I was getting was not 100% accurate. In some target and source language pairs where either the target or the source language was "no" the value was taken as False. I have corrected the miscellaneous error and the CSV file now I am getting is giving an accuracy of 100%.
Here is my github repository that includes the code as well as the final output.
Github Repository.

Mar 22 2023, 11:31 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 21 2023

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8714075, @986_875_764 wrote:

Hello everyone. Happy new week. I hope you all are doing great.

@awight @Simulo , Please I have a few concerns

After contributing to any of the microtasks above, how do I know if it was accepted?

Is there any other thing I am to do after contributing?

Mar 21 2023, 8:23 PM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 20 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8707712, @JaisAkansha wrote:

Hello @Simulo and @awight , I tried to do this task to record my contribution with approach of read all the .yaml files in the config directory of the cxserver repository, except for the files listed in the question. For each file, it reads the YAML data, extracts the supported language pairs, and appends them to the supported_pairs list and then stored it in csv file .
Here is my contribution link https://github.com/akanshajais/Extract-cxserver-configuration-and-export-to-CSV Looking forward for you feedback and suggestions.
I am also attaching my resulted CSV file .
{F36918181}

Mar 20 2023, 2:40 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

Hello @awight and @Simulo Thank you for your guidance. As you mentioned in your earlier comment, I have included tests in my repository for my program. I would be really thankful if you could take out some time to review them. I am attaching the Github repository link for your kind reference.
Github Repository Link
There is one more help I wanted to ask for. I wanted to discuss the prospective Outreachy internship project timeline with the mentors but I don't know how should I do it. So I thought maybe I could make a private Github repository and add our mentors there as contributors so that they can review it and give their valuable suggestions and guidance if that is okay. Also is there any specific format or examples that the mentors would like us to follow, I am fairly new to writing proposals and prospective timelines and hence would really appreciate if the mentors could guide me over how should I start.
Also are there any community specific questions that the mentors would like us to answer.
Thank you in advance.

Mar 20 2023, 2:20 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Mar 18 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight It took me while but I was finally able to use the functionality of mt-defaults.wikipedia.yaml in the parser. Now the value of "is preferred engine?" is not all false by default and I have one less file to ignore. Please have a look at the repository and share your views. Thanks in advance.
Github Repository Link
Updated CSV File -

supported_pairs.csv558 KBDownload

Mar 18 2023, 9:05 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Mar 17 2023

Abhishek02bhardwaj added a comment to T331204: Produce flow diagrams illustrating translation imbalances.

@awight @Simulo Here is my submission for this task. Actually I am not taking this as completed because there are still a lot more things that can be tried with the data but i wanted to register my findings that I have come across up till now. So to accomplish the task what I did was first I converted the data from Here into a CSV file and loaded into R as it was mentioned in the example. Then I tried to make out diagrams from that data with the best of my understanding to find new observations. Like for my first sankey diagram I used all of the languages present in the data which gave me a result similar to the one in the example. Then I removed english from both source and target language to see which were the other languages with maximum number of translation. Observing that diagram I got three such more languages (fr, ru, es). Removing them from the target and source language I observed that their was less imbalance than previous two cases but still the translation data was highly imbalanced.
I have recorded all of my codes and the outcome diagrams in a github repository. I would be very thankful to have your views and suggestions over it.
Github Repository Link
Diagrams I got as result.

sankey_diagram_excluding_english.png (1×1 px, 345 KB)

Mar 17 2023, 9:06 PM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 16 2023

Abhishek02bhardwaj added a comment to T331202: Configuration evolution over time.

In T331202#8694071, @Abhishek02bhardwaj wrote:

@awight and @Simulo I re-read my code and realised that there were certain structural glitches. The time stamps that were appended in the CSV file were not the ones when the commit took instead they were some else. So i wrote the code again from scratch and used a little different approach.
I am leaving a small summary of my work - This code is designed to track changes to a CSV file over time using Git. It includes three functions: parse_csv(), run_git_command(), and get_commits(). The parse_csv() function reads in a CSV file, converts it to a dictionary, and returns it. The run_git_command() function runs a given Git command and returns the output. The get_commits() function retrieves a list of all Git commits and for each commit, retrieves the timestamp, CSV data, and appends them to a list of dictionaries. Finally, the export_csv() function writes the accumulated data for each commit into a new CSV file that includes an additional column for the commit timestamp.
The main program calls these functions and exports the data history to a CSV file named 'data_history.csv'. The data is arranged in a flat structure, with each row representing a single commit, and includes the city, temperature, and timestamp for that commit.
Link to the Github Repository - https://github.com/Abhishek02bhardwaj/Evolution-Tracker
The Updated CSV file -
data_history.csv4 KBDownload

I will be very grateful if you could take out some time to review my submission. I have also updated the read-me file. Since I am new to writing open-source code I do not have that much of expertise in code documentation, therefore if you could give me any suggestions to improve my read-me, I would be very thankful.

Mar 16 2023, 3:39 PM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 14 2023

Abhishek02bhardwaj attached a referenced file: F36911432: data_history.csv.

Mar 14 2023, 7:08 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331202: Configuration evolution over time.

@awight and @Simulo I re-read my code and realised that there were certain structural glitches. The time stamps that were appended in the CSV file were not the ones when the commit took instead they were some else. So i wrote the code again from scratch and used a little different approach.
I am leaving a small summary of my work - This code is designed to track changes to a CSV file over time using Git. It includes three functions: parse_csv(), run_git_command(), and get_commits(). The parse_csv() function reads in a CSV file, converts it to a dictionary, and returns it. The run_git_command() function runs a given Git command and returns the output. The get_commits() function retrieves a list of all Git commits and for each commit, retrieves the timestamp, CSV data, and appends them to a list of dictionaries. Finally, the export_csv() function writes the accumulated data for each commit into a new CSV file that includes an additional column for the commit timestamp.
The main program calls these functions and exports the data history to a CSV file named 'data_history.csv'. The data is arranged in a flat structure, with each row representing a single commit, and includes the city, temperature, and timestamp for that commit.
Link to the Github Repository - https://github.com/Abhishek02bhardwaj/Evolution-Tracker
The Updated CSV file -

data_history.csv4 KBDownload

I will be very grateful if you could take out some time to review my submission. I have also updated the read-me file. Since I am new to writing open-source code I do not have that much of expertise in code documentation, therefore if you could give me any suggestions to improve my read-me, I would be very thankful.

Mar 14 2023, 7:03 PM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 13 2023

Abhishek02bhardwaj added a comment to T331202: Configuration evolution over time.

Mar 13 2023, 5:58 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8687962, @awight wrote:

In T331201#8687733, @Abhishek02bhardwaj wrote:

I wanted to ask one more thing. Is it okay now if I record my submission of this task on the Add a Contribution page of Outreachy and should I mark it accepted/merged or not.

Yes, please do record your contribution in outreachy.org . I'll have to get back to you about the accepted/merged question, we haven't done this part of the process for any contributions yet and I want to be sure that we apply the same criteria to everyone. Great work making it through this task and helping us adapt to complications such as unused config files.

Mar 13 2023, 5:05 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight Since "notAsTarget" hasn't been used in the files, Is it okay ignore it for now.
I wanted to ask one more thing. Is it okay now if I record my submission of this task on the Add a Contribution page of Outreachy and should I mark it accepted/merged or not.

Mar 13 2023, 2:54 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Mar 12 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@Anshika_bhatt_20 Hey Anshika, do you mind taking a look at the repository and and the CSV file and sharing your views. I'll be really thankful.

Mar 12 2023, 9:44 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight I have updated the github repository with the updates in code and the CSV file. To address the last issue left I have used a slightly different approach. Instead of parsing the "transform.js" file from the config folder and then using it to generate the target and source language pairs, what I have done is that i have used the logic of "transform.js" directly in my code. This gives us two benefits:

It reduces the compile and execution time (though very slightly) since I didn't have to import another library to use the Javascript file.
It keeps the code simple to understand (or at least i hope so).

I would be really grateful if you could take a look at the repository and share your reviews. Thank you.
Github Repository Link - https://github.com/Abhishek02bhardwaj/Extract-cxserver-configuration-and-export-to-CSV
Updated CSV File -

supported_pairs.csv558 KBDownload

Mar 12 2023, 9:42 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8685542, @Anshika_bhatt_20 wrote:

In T331201#8685504, @Abhishek02bhardwaj wrote:

In T331201#8685387, @Anshika_bhatt_20 wrote:

@Abhishek02bhardwaj I think to use the handler.js file to access the source and target languages, you would need to modify the code in the supported_pairs.py file to include the logic to parse the handler.js files. Please correct me if I am wrong.

@Anshika_bhatt_20 You mean the "first working test.py". Yes, in the parser code i have handled the two types of files separately:

Files which are using the standard configuration (They have been dealt with and are included in the CSV file)

Files which are not using the standard configuration and instead use the handler file "Transform.js" to deal with the format.

I am trying to figure out the logic to use the handler file.

To access the source and target languages in files that use the transform.js handler, you will need to parse the file using the js2py library. I hope this resolved your issue. Let me know if this helps or not.

@Anshika_bhatt_20
Ohkay thank you for the advice. I'll definitely try it. Right now I have converted the transform.js handler into a python file(that i wrote myself), I think it should work.

Mar 12 2023, 8:25 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8685387, @Anshika_bhatt_20 wrote:

@Abhishek02bhardwaj I think to use the handler.js file to access the source and target languages, you would need to modify the code in the supported_pairs.py file to include the logic to parse the handler.js files. Please correct me if I am wrong.

Mar 12 2023, 7:04 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8685386, @Anshika_bhatt_20 wrote:

hey, @Abhishek02bhardwaj can you tell me about the issue in detail? where exactly you are facing issue?

Mar 12 2023, 6:57 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331207: Compose a short survey for Content Translation users.

In T331207#8685153, @Simulo wrote:

general Feedback:

Age: Please use brackets that go up to 70 or 80, not just "50 or older" (I used only the few first brackets in my example, simply because I did not want to write a very long list)

Name: I am curious: Why do so many surveys ask for a name? What do you want to do with it in your analysis?

Type of content to translate: Quite some surveys ask for "What type of online content do you translate?". The tool we analyse in this task is primarily used to tranlsate Wikipedia articles. You can ask what content people usually translate if you have a particular interest in that, but the tool in question does not aim to be used e.g. to translate social media posts.

If you just get started or review your survey draft, please also consider the newly added links in the task description:

12 Tips For Writing Better Survey Questions, Jeff Sauro, measuringu

Writing surveys that work, Rebecca Weiss, Mozilla

Mar 12 2023, 1:00 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 11 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight @srishakatux @Simulo The first of the two issues I listed in my previous comment has been addressed in my most recent commit to the repository. Now all the language pairs of the respective engines are there in the CSV file. Just to make it more accessible I am adding the CSV file here and also the github repository link. Now the only issue left to address in this task is how to use the handler.js file to access the source and target languages. I am still trying to figure that out and would really appreciate any kind of help.
Github Repository Link - https://github.com/Abhishek02bhardwaj/Extract-cxserver-configuration-and-export-to-CSV
Updated CSV File -

supported_pairs.csv10 KBDownload

Mar 11 2023, 11:52 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Mar 10 2023

Abhishek02bhardwaj attached a referenced file: F36905478: supported_pairs.csv.

Mar 10 2023, 6:53 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight I have updated the github repo and made some changes in the parser to accommodate the changes that you suggested. The following changes can be seen in the CSV file:

The engine name is now the name of the file just how it was supposed to be.
I have removed the unwanted handlers that I added in while testing.
I have excluded the files that were supposed to be ignored.

The following are the issues that I am yet to address:

The parser takes into consideration only the first source language of the file (since i hardcoded that while testing). We needed all the source languages and their respective target languages. To accomplish the same I will include the code snippet that was accessing the target languages for the source language in a while loop and use a try and except to handle the error that might arise at the end of the list. I am aware that this explanation might not be sufficient to explain what I am trying to do but I just wanted to keep it here since it might help someone else too.
I am yet to understand how to use the transform.js handler. @awight I needed a bit of help regarding that. I am not really sure about how I can use the handler.js file to get the source and target language pairs from Google.yaml and Yandex.yaml. I would really appreciate if you could guide me over that.

link of the repo (just to make it easier to access) - https://github.com/Abhishek02bhardwaj/Extract-cxserver-configuration-and-export-to-CSV
The Updated CSV file -

supported_pairs.csv6 KBDownload

Mar 10 2023, 6:40 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8683032, @awight wrote:

In T331201#8682945, @Abhishek02bhardwaj wrote:

@awight I am not sure if JsonDict.yaml should also be ignored because it looks fine to me.

I think you're right—it seems to be important, although to a slightly different service than bulk machine translation. The source code calls the concept a "dictionary" and I think it's for translating one word at a time. Here it's first loaded into config: https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/lib/Config.js#L38 . Ultimately this seems to be published under the /dictionary route: https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/lib/routes/v2.js#L28 available on request from a user's browser for the a dictionary tool. Documentation is here: https://cxserver-beta.wmcloud.org/v2?doc#!/Dictionary/get_v1_dictionary_word_from_to_provider . Words can be requested like this: https://cxserver-beta.wmcloud.org/v2/dictionary/cocer/es/ca/JsonDict . My outsider's reading of the code suggests that dictionaries are only used along with "section translation". The testing site, when given ca -> es: https://test.m.wikipedia.org/w/index.php?title=Special:ContentTranslation&from=ca&to=es and https://test.m.wikipedia.org/w/index.php?title=Special:ContentTranslation&from=ca&to=es&page=Hist%C3%B2ria%20colonial%20d%27Am%C3%A8rica%20del%20Nord&sx=true#/sx/sentence-selector should expose the dictionary somehow, but I don't see where this is yet.

Nevertheless, I think the outcome is that JsonDict and Dictd dictionaries *might* be important or might become important, but we should consider these separately from bulk machine translation which is called "mt" in the source code.

Mar 10 2023, 2:21 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight I am not sure if JsonDict.yaml should also be ignored because it looks fine to me.

Mar 10 2023, 11:33 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight Regarding the mt-defaults.wikimedia.yaml file, it sets the default translation engines to be used for each language pair if no other engine is specified in the configuration files. This file does not define the language pairs themselves, so it does not affect the supported translation pairs. I wanted to use it in the parser but again I felt that first I should address the engine name issue.

Mar 10 2023, 9:50 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8682588, @awight wrote:

In T331201#8680457, @Abhishek02bhardwaj wrote:

the github repository in which I have added the python code (keeping it private to preserve the code privacy if that is fine).

Hi @Abhishek02bhardwaj, nice to see this contribution! I would encourage you to make the repository public as we will be working in an open-source style for this project, and also because your code might be helpful for other participants and collaborators. However, it's okay if you prefer to keep this private for now--please add GitHub users adamwight and jdittrich to the project so that we can review.

From the CSV, I can see that only one translation engine (Apertium) appears. Maybe this value is being accidentally hardcoded, or maybe the parser still needs to be extended to process all files in the directory?

Mar 10 2023, 9:43 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

In T331201#8682619, @awight wrote:

@Abhishek02bhardwaj there are also some non-language values peppered throughout the CSV, such as "removableSections" and "languages", so perhaps the parser needs to become aware of the various file formats which appear in that directory, and which files should be included or excluded from processing?

Mar 10 2023, 9:38 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight Thanks for taking out some time to review my contribution. I have made the repository public so now it will be easier to access by anyone and help others too to improve the quality of work.

Mar 10 2023, 9:33 AM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Mar 9 2023

Abhishek02bhardwaj added a comment to T331201: Extract cxserver configuration and export to CSV.

@awight @Simulo @srishakatux I have tried to write some code in python to make a parser for these files and create a single flat, in-memory structure with all of the supported pairs and export this data as a CSV of all pairs, with at least the required columns. I am attaching the CSV file that I am getting as the result and also a link of the github repository in which I have added the python code (keeping it private to preserve the code privacy if that is fine).
Link to the github repository - https://github.com/Abhishek02bhardwaj/Extract-cxserver-configuration-and-export-to-CSV
The CSV file that I got as result -

supported_pairs.csv10 KBDownload

Mar 9 2023, 4:20 PM · CX-cxserver, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

In T328597#8678671, @samuelmayna wrote:

Greeting everyone,
I am Samuel Maina from Nairobi Kenya. A node.js,Java and React.js developer. Looking forward to contributing to the wikimedia organization. Please can someone guide me on how to get started as I go through the previous chats to gain a grimpse of the wikimedia organization?

Mar 9 2023, 9:10 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331207: Compose a short survey for Content Translation users.

In T331207#8677994, @Simulo wrote:

Thank you for your answers, I would summarize my feedback here:

General concept: It helps to know what you want to find out and particularly, what your assumptions are. If you know your assumptions, you can put them to a test. We broadly wrote "The goal of the survey is to learn more about how this software is used" but that is pretty vague, maybe actually a question for a qualitative study. How do you think people translate? Can you learn something about translations from published research that informs your assumptions? With assumptions and questions about them you can focus your survey questions and the survey design.

Introduction: This is very important as it is what participants can use to see if they want to participate or not. Give the purpose of the survey but avoid merely repeating questions you are going to ask. Check if the language is clear. In this case, it might be tempting to use e.g. "qualitative" and "quantitative" here, but are reseracher jargon that might be unfamiliar to the participants.

Questions: Aim to make them both a) short and clear b) self-contained, so they can be understood without reading previous questions

Answers: As important as the questions you ask are the answers you allow…

Age: People might not like to give their age in years for various reasons, so age ranges are a good practice. The need to be unambigous: 20-30, 30-40 are ambigous – what do I check when I am 30? Better: 20-29, 30-39 or 21-30, 31-40

Gender: "How to Do Better with Gender on Surveys: A Guide for HCI Researchers" is my go-to resource (however, this does not mean that you need to take this route; some people might also opt for an open text field (in which case you need to think about how to analyse that) or not ask gender at all)

Nationality/Country: Important to keep in mind that the state controlling the territorry they are on might be different than the nation they see themselves belonging to.

Languages: Not easy, because it is unclear what skill level is meant. You could also set a definition like "good enough to write a Wikipedia article in that language" (which is not a perfect criterion, but at least something). If it is important to you, you could ask for their proficiency level. If that makes sense depends on your research interest (you can use more powerful analysis methods with rank-able items like a proficiency scale rather than a binary competent/not-competent scale, but it is more difficult to answer, so the question if it is worth it)

Reasons for translations, motivations etc.: To ask this you should have some good hypothesis of what you can to do with the data. As merely descriptive data of how participants are like they are not very informative (they depend on how people interpret them, how desireable the answers are etc.). They can be useful if you have an hypothesis like "people who are motivated by increasing coverage in their native language edit smaller wikipedias" (which would not be surprising, but can be tested)

Asking for frequencies: To better compare these, give some hints what you mean: "In the last month, how often did you…": Never/ 1-2 times/3-10 times/11-50 times/ more than 51 times (there are different scales you can use – look at some examples)

Mar 9 2023, 8:24 AM · Outreachy (Round 26), Outreach-Programs-Projects

Mar 8 2023

Abhishek02bhardwaj added a comment to T331207: Compose a short survey for Content Translation users.

In T331207#8675536, @Adedolapooye wrote:

https://etherpad.wikimedia.org/p/Adedolapo_-_survey_for_content_Translation_users

Kindly find above link to the new etherpad for the survey for content translation users

Mar 8 2023, 8:58 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331207: Compose a short survey for Content Translation users.

@Simulo @awight
Here is the link to my survey for this task. I have tried to keep it short as I think a survey which is too big to fill gets inconvenient for the user to fill. Please have a look and let me know about any improvement and changes you would suggest. Thank you.
Link - https://etherpad.wikimedia.org/p/xGzVywcafj65F66Gea2n

Mar 8 2023, 8:46 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj updated the task description for T331200: Ultralight systematic literature review.

Mar 8 2023, 6:25 AM · Internet-Archive, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj updated the task description for T331200: Ultralight systematic literature review.

Mar 8 2023, 6:12 AM · Internet-Archive, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj updated the task description for T331200: Ultralight systematic literature review.

Mar 8 2023, 6:11 AM · Internet-Archive, Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj updated the task description for T331200: Ultralight systematic literature review.

Mar 8 2023, 5:57 AM · Internet-Archive, Outreachy (Round 26), Outreach-Programs-Projects

Mar 7 2023

Abhishek02bhardwaj added a comment to T331199: Read paper and make guesses about how it applies to translators.

@awight @Simulo @Aklapper @srishakatux anything I missed or any suggestions especially for the hypotheses.

Mar 7 2023, 7:36 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331199: Read paper and make guesses about how it applies to translators.

Hypotheses about the patterns expected in a dataset of translation between different Wikipedias
Now based on the above paper, there are certain patterns that we can expect in a dataset of translation between different Wikipediaes. I would be stating these patterns in the form of hypothesis as follows:
• Broadband connectivity will predict the curve of translations between different Wikipediaes up to and extent: In the paper itself it is mentioned that - countries with very small and high numbers of broadband Internet connections commit more edits to Wikipediaes than one would expect assuming a linear trend. On average, countries with medium numbers of broadband Internet connections commit fewer edits than expected.
The same trend is expected when we think of translations the reason being that the languages which are being spoken in countries and regions having better broadband connectivity have more people interested in consuming information. So, the information in languages spoken in areas having better broadband connectivity will be in more demand than in the languages spoken in areas having lesser broadband internet connections.

Mar 7 2023, 7:28 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T331199: Read paper and make guesses about how it applies to translators.

Summary of the Paper titled "Digital Division of labor and informational magnetism: Mapping participation in Wikipedia"
The paper “Digital Division of Labor and Informational Magnetism” discusses the struggles over the ways people and organizations try to control information produced, reproduced, and used. The paper focuses on Wikipedia, which is the world’s largest and most used repository of user-generated content. The paper argues that digital mediations of spatial knowledge have compounded the subordination of local voices, erasing the positionality of the user and social contents under which knowledge is produced. The paper takes this situation as a starting point to investigate whether the Internet, with its mass participation, affords a potentially disruptive role in breaking the digital divisions of labor. The authors describe the data used in a study of Wikipedia’s geographies, which sought to explore the geographic distribution of Wikipedia content and contributors. Three types of data were used here: data about the location of Wikipedia articles, data about the origins of edits, and data about the geographic focus of those edits. Further in the paper it is explained that how the data were processed, and any limitations of the data. It also discusses the challenges of analyzing anonymous edits and the edits by registered editors and the methods used to geolocate those edits. After that the paper discusses the geographic distribution of edits committed to all Wikipediaes from different countries as measured by Wikimedia. The study found that countries in North America, Europe, and Asia had the highest number of edits, while Sub-Sharan Africa had the lowest number. The article also analyzed the propensity of people in any country to commit edits and found that North America, Europe, and much of Oceania stood out strongly compared with regions having medium participation levels and regions with very low levels found mostly in Sub-Saharan Africa. It further explores the factors that covary with the geography of participation in Wikipedia and offers insight into the geographies of participation and voice on the English-language version of Wikipedia. The article demonstrates that the availability of broadband connectivity is a central predictor of the spatial unevenness of participation. The authors use geocoded articles and origin locations of anonymous and registered edits to analyze the source locations, target location, and respective editing volumes. They investigate the geographies of local voice by looking at the volume of autochthonous content by region, analyzing the within- region-edits, and looking at trajectories or networks of editing over space. The authors conclude that large amounts of geospatial content show no sign of deterring people from further contributions and editing, and that North America and Europe commit more than they receive into their territories.

Mar 7 2023, 7:25 PM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

Hello everyone,
My name is Abhishek Bhardwaj and I am a sophomore at the University of Delhi. I am really excited to contribute to Wikimedia, especially this project. I hope this is the right place to communicate.

Mar 7 2023, 8:12 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj added a comment to T328597: [Outreachy round 26] Research into translation imbalances.

Mar 7 2023, 7:51 AM · Outreachy (Round 26), Outreach-Programs-Projects

Abhishek02bhardwaj (Abhishek Bhardwaj)
User

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent Activity
View All

Sep 6 2023

Aug 30 2023

Jun 30 2023

May 28 2023

May 15 2023

Apr 15 2023

Apr 7 2023

Apr 6 2023

Apr 4 2023

Apr 3 2023

Apr 2 2023

Apr 1 2023

Mar 31 2023

Mar 30 2023

Mar 27 2023

Mar 26 2023

Mar 24 2023

Mar 23 2023

Mar 22 2023

Mar 21 2023

Mar 20 2023

Mar 18 2023

Mar 17 2023

Mar 16 2023

Mar 14 2023

Mar 13 2023

Mar 12 2023

Mar 11 2023

Mar 10 2023

Mar 9 2023

Mar 8 2023

Mar 7 2023

Abhishek02bhardwaj (Abhishek Bhardwaj)User

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent ActivityView All

Sep 6 2023

Aug 30 2023

Jun 30 2023

May 28 2023

May 15 2023

Apr 15 2023

Apr 7 2023

Apr 6 2023

Apr 4 2023

Apr 3 2023

Apr 2 2023

Apr 1 2023

Mar 31 2023

Mar 30 2023

Mar 27 2023

Mar 26 2023

Mar 24 2023

Mar 23 2023

Mar 22 2023

Mar 21 2023

Mar 20 2023

Mar 18 2023

Mar 17 2023

Mar 16 2023

Mar 14 2023

Mar 13 2023

Mar 12 2023

Mar 11 2023

Mar 10 2023

Mar 9 2023

Mar 8 2023

Mar 7 2023

Abhishek02bhardwaj (Abhishek Bhardwaj)
User

Recent Activity
View All