Page MenuHomePhabricator

📊Implement parsing of “instance of” fields in ImageMatching production datasets
Closed, ResolvedPublic3 Estimated Story Points

Description

Companion ticket to https://phabricator.wikimedia.org/T277552

The spark job we use to generate production datasets needs to parse the new "instance of fields"

Acceptance criteria

  • Logic to parse the "instance of" json blob is implemented
  • Tests for this capability have been added
  • The number of articles with and without valid "instance of" metadata is known (add metric)

Note

  • "Instance of" is a json blob and might contain surprises.