The goal of this analysis is to figure out the topics readers are interested in, but those articles are not available (or their quality is not good) in their local language.
We can recommend those popular topics to editors in local communities.
We will use pages viewed and translated by Google in March 2019. Since the vast majority of these pages are translated from English to other language, the first exploration will only check the topics of articles on English Wikipedia.
We break down the translated pageviews by two types:
- **Pageviews translated by Toledo ([[ https://www.thejakartapost.com/life/2018/12/08/google-says-it-uses-ai-to-translate-english-content-to-indonesian.html | Google integrate automatic translated pages in search results ]])**. These articles represent 1) Google thinks the quality of contents in local languages is not as good as translated pages **AND** 2) Users are interested in these articles and thus click through the search results.
- **User initiated translation**. Users paste the article links into Google translate, or click on the "Translate this page" link from their search result. In this case, users are well aware that they are reading a translated article and willing to put more effort to do that, which is an indication of a stronger interest in the articles. We break down the analysis by translation target languages.
We use the [[ https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic | ORES draft topic model ]] to get the topics of articles. The outcome is the [[ https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Directory | WikiProject ]] each article belongs to.