Page MenuHomePhabricator

automatic sourcing of existing Wikidata statements (expanding an existing scripts)
Closed, InvalidPublic

Description

The number of unreferenced statements in Wikidata should decrease. One of the ways to do that is by automatically adding references to existing statements based on schema.org markup. @Addshore has written scripts to start this work. It should be expanded.

Existing work:

What needs to be done next:

  • Expand it to more types (Right now it only supports people and a few others.)
  • Expand the kind of information it can extract for each type
  • Take into account the links in the item itself as well as external IDs (So far it only takes into account the links linked in the Wikipedia articles connected to the item.)
  • Potentially create a autopopulated blacklist of sites that don't contain microdata in order to decrease the number of sites that need to be checked (Right now every linked page is checked regardless of the domain having been checked unsuccessfully several times before.)

Event Timeline

Lydia_Pintscher renamed this task from [Story] automatic sourcing (expanding existing scripts) to [Story] automatic sourcing of existing Wikidata statements (expanding existing scripts).May 12 2017, 4:03 PM
This comment was removed by MrSteff.
Lydia_Pintscher renamed this task from [Story] automatic sourcing of existing Wikidata statements (expanding existing scripts) to automatic sourcing of existing Wikidata statements (expanding an existing scripts).Apr 7 2018, 11:39 AM

Maybe also relevant: The data used for the tool that @Lydia_Pintscher linked to are avilable on Toolforge in the s53867__referee_p database (readable with any toolforge account), especially the statements table.

We're working on this now and are tracking work in https://phabricator.wikimedia.org/project/view/4635/
I'll close this ticket as we're not expanding the existing script but creating a more flexible system.