As a PM, I want to create a sample set of Structured Contents EN Wikipedia namespace 0 of People with infoboxes, so that I can share with researchers.
Acceptance criteria
Snapshot of the EN Wiki that contains only articles related to people with parsed infoboxes.
ToDo
- download english wikipedia snapshot
- create a subset of all EN Wikipedia pages on People (approx 2M)
- run infobox parser on that subset of pages
- create a new snapshot of people with infoboxes
Description
This will enable collaboration opportunities to help quantify WME data's quality in the LLM space.