We need to extract schema.org markup from the potential reference and find potential matches between that data and Wikidata's. In this step we are not yet checking if the values actually match.
Input: output of step 1 + “schema.org <-> wikidata property” mapping
Output: ItemIds + {unreferenced statement + { extracted structured data + reference (incl. Ext ID) }}
Substeps:
- Scraping external sites
- Some normalization of structured data via sideservice
- Matching by Property
- Throwing away unneeded structured data
- Throwing away unreferenced statements that don’t have any corresponding structured data