Companion ticket to https://phabricator.wikimedia.org/T277552
The spark job we use to generate production datasets needs to parse the new "instance of fields"
Acceptance criteria
- Logic to parse the "instance of" json blob is implemented
- Tests for this capability have been added
- The number of articles with and without valid "instance of" metadata is known (add metric)
Note
- "Instance of" is a json blob and might contain surprises.