===Brief summary
Understanding what countries are relevant to a given Wikipedia article is an important facet of many tools and data analyses. For instance, editors might want to find articles about people from their country to improve or researchers might be interested in how diverse the representation of content on Wikipedia is in terms of geographic coverage. Some articles can clearly be assigned to countries based on associated latitude-longitude data or clear information about where someone was born. For many other articles though which clearly are associated with a country such as [[https://en.wikipedia.org/wiki/Lovecraft_Country_(TV_series)|Lovecraft Country (TV series)]] or [[https://en.wikipedia.org/wiki/FC_Barcelona|FC Barcelona]], country information (United States and Spain, respectively) can likely be inferred with high accuracy. This project will focus on developing an approach to infer what country(ies) are associated with any given Wikipedia article. There will be three phases to the project:
* Develop model for assigning countries to Wikipedia articles (Python)
* Analyze geographic distribution of content on Wikipedia and compare to the geographic distribution of pageviews to Wikipedia articles (Python; data science)
* Build simple interface for people to test the tool -- e.g., similar to this early prototype: https://wiki-topic.toolforge.org/countries (UI design; HTML/CSS/JS)
===Skills required
* Python for modeling and data science
====Nice to have but willingness to learn is sufficient
* Jupyter notebooks for documentation and visualization of data
* Any skills in HTML/CSS/JS and general design will also be useful for building the interface to showcase the model
===Possible mentor(s)
@Isaac @MGerlach
===Microtasks
Choose one of the below tasks. If you complete one, you may work on any of the other microtasks but only one is required for the application.
T263874
more to come...