Coming out of a quick conversation between @Jdforrester-WMF and @ArielGlenn, we need to determine what if any specific work is needed to adjust the dumps for Commons. This is not for immediate decision and execution – we don't have to climb the whole mountain at once – and indeed is blocked on work to determine what data will exist (e.g. virtual properties).
- Binaries – Already done; any changes expected?
- Wikitext – Already done; will continue, changed via T198706
- SDC data – Raw JSON will come along as part of T174031
This will get us the raw content of the slot, but what more do we need to do, if anything?
- We'll need to dump local properties, presumably. (Are there going to be any local non-media items?)
- Can virtual properties be ignored (and caculated on import like EXIF), or will they need exporting?
- Is it OK to ask people to grab the Wikidata XML or entity dumps to use with the Commons dump? Will users demand a special de-referenced walk of Wikidata from Commons properties? What about recursion? (Oy.)
- Will users want a separate sort of 'commons media info' weekly or bimonthly run with information specially formatted for folks working with media file meta-data?