Problem Statement (problem to be solved):
When something big happens in the world (famous person dies, new leader of a country is elected, natural disaster strikes, …) it very often leads to a spur of attention on the same topic in the Wikimedia projects. A lot more people consume the content and a lot more people contribute to the content. This increase in attention also makes the Items related to the event a target for malicious edits. Making the Items easier to find for people who care about having accurate and up-to-date data about current events will enable them to better tackle the malicious edits there.
Plan for a prototype:
Phase 1. Fetch revisions (SQL)
- develop a simple crontab SQL daemon that extracts all Wikidata revisions every hour;
- sort out the revision frequency per item and property and check if anything is spiking;
- sort out human from bot/spider revisions;
Phase 2. Describe what is going on on Wikidata (SPARQL/MediaWiki API/R)
- send a set of SPARQL queries (better: MediaWiki API calls) to WDQS, to describe the items and properties that are currently in the community's focus;
- report essential revision statistics on the top edited items, classes, properties, per hour, day, and maybe week;
Phase 3. See if something in the World is correlated with the changes in Wikidata (NEWSRIVER API/R)
- use the free NEWSRIVER API to fetch the latest headlines and news using the top edited Wikidata entities as search terms;
- perform a basic, quick and simple search through the collected news and headlines to see if they frequently mention any of the Wikidata entities that are found under revision (or anything in relation to them: their properties, some characteristic values, etc) - not "real", ML-driven entity-linking in the prototype;
- only if there is a clear, distinguishable match, generate a set of candidate news stories and report their URLs.
- serve as a simple, client-side dependent dashboard that checks for the fresh, hourly generated data on the WMF's servers.