Page MenuHomePhabricator

Pipe 2: Scrape given URLs and return potential matches between unreferenced statements and scraped structured data (by property)
Closed, ResolvedPublic

Description

We need to extract schema.org markup from the potential reference and find potential matches between that data and Wikidata's. In this step we are not yet checking if the values actually match.

Input: output of step 1 + “schema.org <-> wikidata property” mapping
Output: ItemIds + {unreferenced statement + { extracted structured data + reference (incl. Ext ID) }}
Substeps:

  • Scraping external sites
  • Some normalization of structured data via sideservice
  • Matching by Property
  • Throwing away unneeded structured data
  • Throwing away unreferenced statements that don’t have any corresponding structured data

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript
ItamarWMDE renamed this task from Scrape given URLs and return potential matches between unreferenced statements and scraped structured data (by property) to Pipe 2: Scrape given URLs and return potential matches between unreferenced statements and scraped structured data (by property).Mar 31 2020, 3:08 PM