This is an additional (optional) task proposal for the wikidata stream of wiki-techstorm-2019
There is an actively maintained list of data breach events on wikipedia. The objective of this task would be import this data into wikidata. Once in wikidata, the catalog of events can be queried in various simple or more complex ways to create interesting insights about data security risks.
The steps required would be to
- parse the wikitext of that link
- process the row entries and create unique entities per event in wikidata using a data model adapted to the available data and the nature of these entries (events, involving corporate/public entities etc.)
- some data wrangling with openrefine / other might be required as the records are not consistent