Page MenuHomePhabricator

Provide appropriate dumps of Commons including the structured data
Open, LowPublic

Description

Coming out of a quick conversation between @Jdforrester-WMF and @ArielGlenn, we need to determine what if any specific work is needed to adjust the dumps for Commons. This is not for immediate decision and execution – we don't have to climb the whole mountain at once – and indeed is blocked on work to determine what data will exist (e.g. virtual properties).

  • Binaries – Already done; any changes expected?
  • Wikitext – Already done; will continue, changed via T198706
  • SDC data – Raw JSON will come along as part of T174031

This will get us the raw content of the slot, but what more do we need to do, if anything?

  • We'll need to dump local properties, presumably. (Are there going to be any local non-media items?)
  • Can virtual properties be ignored (and caculated on import like EXIF), or will they need exporting?
  • Is it OK to ask people to grab the Wikidata XML or entity dumps to use with the Commons dump? Will users demand a special de-referenced walk of Wikidata from Commons properties? What about recursion? (Oy.)
  • Will users want a separate sort of 'commons media info' weekly or bimonthly run with information specially formatted for folks working with media file meta-data?

Event Timeline

hoo added a subscriber: hoo.Oct 18 2018, 8:27 PM
Addshore moved this task from incoming to monitoring on the Wikidata board.Jan 25 2019, 2:23 PM
Ramsey-WMF triaged this task as Low priority.Mar 8 2019, 5:12 PM
Ramsey-WMF moved this task from Untriaged to Triaged on the Multimedia board.

See T221917 where this is actually being done (1/2, the other half which is the inclusion in xml dumps, depends on some pending changes to core still in the works). Should I merge this task into the other one?

Restricted Application added a project: Multimedia. · View Herald TranscriptAug 10 2019, 11:55 PM