Page MenuHomePhabricator

automatic sourcing of existing Wikidata statements (expanding an existing scripts)
Open, HighPublic

Description

The number of unreferenced statements in Wikidata should decrease. One of the ways to do that is by automatically adding references to existing statements based on schema.org markup. @Addshore has written scripts to start this work. It should be expanded.

Existing work:

What needs to be done next:

  • Expand it to more types (Right now it only supports people and a few others.)
  • Expand the kind of information it can extract for each type
  • Take into account the links in the item itself as well as external IDs (So far it only takes into account the links linked in the Wikipedia articles connected to the item.)
  • Potentially create a autopopulated blacklist of sites that don't contain microdata in order to decrease the number of sites that need to be checked (Right now every linked page is checked regardless of the domain having been checked unsuccessfully several times before.)

Event Timeline

Addshore moved this task from Incoming to Ready on the Addwiki board.Feb 8 2017, 11:39 PM
Lydia_Pintscher renamed this task from [Story] automatic sourcing (expanding existing scripts) to [Story] automatic sourcing of existing Wikidata statements (expanding existing scripts).May 12 2017, 4:03 PM
This comment was removed by MrSteff.
Lydia_Pintscher renamed this task from [Story] automatic sourcing of existing Wikidata statements (expanding existing scripts) to automatic sourcing of existing Wikidata statements (expanding an existing scripts).Apr 7 2018, 11:39 AM
Bmueller updated the task description. (Show Details)May 18 2018, 5:09 AM

Maybe also relevant: The data used for the tool that @Lydia_Pintscher linked to are avilable on Toolforge in the s53867__referee_p database (readable with any toolforge account), especially the statements table.

Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.Nov 2 2018, 1:07 PM
Hjfocs added a subscriber: Hjfocs.Dec 13 2018, 3:31 PM