Page MenuHomePhabricator

Develop an approach to infer which countries are associated with a given Wikipedia article
Closed, ResolvedPublic


Brief summary

Understanding what countries are relevant to a given Wikipedia article is an important facet of many tools and data analyses. For instance, editors might want to find articles about people from their country to improve or researchers might be interested in how diverse the representation of content on Wikipedia is in terms of geographic coverage. Some articles can clearly be assigned to countries based on associated latitude-longitude data or clear information about where someone was born. For many other articles though which clearly are associated with a country such as Lovecraft Country (TV series) or FC Barcelona, country information (United States and Spain, respectively) can likely be inferred with high accuracy. This project will focus on developing an approach to infer what country(ies) are associated with any given Wikipedia article. There will be three phases to the project:

  • Develop model for assigning countries to Wikipedia articles (Python)
  • Analyze geographic distribution of content on Wikipedia and compare to the geographic distribution of pageviews to Wikipedia articles (Python; data science)
  • Build simple interface for people to test the tool -- e.g., similar to this early prototype: (UI design; HTML/CSS/JS)

Skills required

  • Python for modeling and data science
Nice to have but willingness to learn is sufficient
  • Jupyter notebooks for documentation and visualization of data
  • Any skills in HTML/CSS/JS and general design will also be useful for building the interface to showcase the model

Possible mentor(s)

@Isaac @MGerlach


The microtask that will be evaluated for your application is: T263874
Additional tasks might be added but they will be purely optional if you finish and desire to explore further.

Related Objects

Event Timeline

Isaac changed the visibility from "Public (No Login Required)" to "Outreachy Mentors (Project)".Sep 23 2020, 2:52 PM
Aklapper renamed this task from Insert project title here to Develop an approach to infer which countries are associated with a given Wikipedia article.Sep 23 2020, 3:08 PM

@Isaac Looks perfect! Feel free to upload it on the Outreachy site. And, you would be a second mentor on this project, I'm assuming?

Thanks @Aklapper for filling in the title -- I overlooked that.

And, you would be a second mentor on this project, I'm assuming?

@srishakatux haha, yes, whoops, thanks for pointing that out -- I'll add myself (myself as primary mentor; Martin as secondary).

Feel free to upload it on the Outreachy site.

Sounds good -- I should have the microtasks up in the next few days as well.

Isaac changed the visibility from "Outreachy Mentors (Project)" to "Public (No Login Required)".Oct 7 2020, 4:28 PM

Hi everyone...I'm Shamima. I'm really interested in contributing to this project. I'm guessing we should start with T263874 ?

Hi Shamima, I'm Jesse. Welcome. To your question, yes we are to start with T263874.

Hi all -- yes, the application task is T263874. Don't hesitate to ask questions on that task / help each other out!

Hi everyone, I'm Abhipsha an Outreachy applicant this time. I'm looking forward to contributing to this project very much. Cheers! 😄

Hi everyone.! I'm Safia another Outreachy applicant. I'm kinda new to open source but really interested about this project. Lets all do our best.!!

Hi everyone! I am Tanya, an Outreachy applicant. I look forward to contributing and interacting with everyone and also learn about open source.

Hello everyone :)

I'm Funmi, an Outreachy applicant. I look forward to contributing to the design of the model that would make the wiki tool excellent.

Hi everyone! I'm Beatriz, an Outreachy applicant. I'm really interested to contributing to this project! I'll do my best!

Hello everyone :) I'm Emica. I'm kind of new to open source, but really interested in this project. I'm anxious...

Hello everyone! :) I am Yemi, an Outreachy applicant. Super excited to be here.

Hello!!!!, I'm Vanessa an Outreachy applicant. I'm really want to be part of this project, I'll do my best to joint this team :)

Welcome all -- I'm excited to see all the interest! Just a head's up that as we enter the weekend, you should expect responses from us (the mentors) to slow down and possibly have to wait until the week starts again. Monday is also a holiday in the United States (Indigenous People's Day) where I am based, so it is quite likely that responses will be slow then too. In the meantime, you should all still feel free to jump in and help each other. Thanks! I'll post this on the task as well.

Hello Everyone,
I'm Stephanie an NLP enthusiast, Outreachy applicant, looking forward to contributing this project and working along side people from all over the world.

Hello everyone,
I am an Outreachy applicant and I am really excited to contribute to this project. I hope I can make some noteworthy contributions to this project.

Hi everyone,
I'm Melanie, an Outreachy applicant and currently also a Recurse Center fellow based in Connecticut, US. I have a background in spatial data science and cartography, but I am super curious to learn more about networks and NLP. Being a regular on Wikipedia myself, I look forward to contributing to such an amazing and relevant project :-)

Hi everyone.
I'm an Outreachy applicant. I'm a new to open source but really excited to contribute to this project and get to know each of you

Hi everyone, I am an outreachy applicant and super excited to get on board and start contributing!

My name is Liz an Outreachy applicant. Has anyone here been able to work with the page table dump without running out to memory.?Would appreciate some tips:)

For documentation purposes, the above question was answered at T263874#6547837

Hi everyone. I'm Lola, an Outreachy applicant; a little late to the party I'm afraid. But I'd really love to contribute to this interesting project.

Hi! My name is Viktor and I'm from Brazil. I'd love to work on this project and expand my experience in Data Science!

Hi everyone, my name is Chelsi and i'm an outreachy applicant excited to contribute. I'm feeling excited and nervous at the same time.

How do we submit this task?

Welcome @Chelsi -- when you have completed the task (T263874), you can submit the notebook link as a contribution via the Outreachy site. There are more details in the task description though.

A few general announcements:

  • As I said in the task, if everyone who intends to apply to this project would submit an initial contribution on Outreachy by Friday (23 October) so I can get a sense of how many applicants we will have, that would be much appreciated. Don't worry if your work isn't complete, you still have plenty of time to make changes and submit a final contribution.
  • The application requires submitting a timeline for work. In general, don't spend too much time on this but feel free to use it to identify which of the three phases I mention in this task (Develop; Analyze; Build) are most interesting to you (and would like to spend more time on) as well as anything else you think would be pertinent to your work on this project.

Hello everyone! My name is Naiara and I am a candidate for Outreachy. I'm excited and I hope I can contribute a lot.

Is everything in this project task planned for Outreachy (Round 21) completed? If yes, please consider closing this and other related tasks as resolved. If bits and pieces are remaining, you could consider creating a new task and moving them there.

Isaac claimed this task.

@srishakatux thanks for the ping. The Outreachy-specific work is complete -- all the other tasks are follow-on tasks for the overall project (not pieces that were expected to be completed during the internship). I'll close this task and then at some point this week move those to another parent task.