Understand the cross-pollination of content across Wikimedia projects.
In this research we want to understand how content propagates across different languages in Wikipedia. As a unit of study we use Wikidata Items with sitelinks. Meaning that we consider the subset of Wikidata Items that has at least one article associated to any Wikimedia project. We start by evaluating once an item is created in one language what is the most probable next language that will propagate to. Our hypothesis is there is a relation between the creation of items in different languages. For example, if an item exists just in one language it might not propagate to more projects, but if the item already exists in 5 languages it is more likely to appear in a new project. Moreover,this probability may also depend on how related languages (as proxy for cultures) are.
This work is done in collaboration with Giovanni Comarela (UFES), @Rvvalentim (Reesarch Intern / UFES) and Souneil Park (Telefónica Barcelona)
- Build a dataset of the creation time of all articles in Wikimedia projects, and their corresponding Wikidata Item.
- Build a test subset
- Define an ML setup for studying the content propagation phenomena.
- Create a Model for predicting content propagation across Wikis.
- Submit a paper.
- Results dissemination (paper.)