Page MenuHomePhabricator

Wikimedia Commons entity dumps are lacking datatype field
Open, Needs TriagePublic

Description

I am working with Wikidata entity dumps and I am trying to use now also Wikimedia Commons entity dumps and I noticed that the latter are lacking the "datatype" field for statements. I see that there is JsonDataTypeInjector to inject that, but I assume it is not running for Wikimedia Commons entities dumps. Is there a reason why not? It would be great if both dumps would be as close to each other as possible (there will be still difference between statements and claims field, but that is easier to mitigate as it is a top-level field).

So, could we enable this injector? What does it take to do so?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I see that there is JsonDataTypeInjector to inject that, but I assume it is not running for Wikimedia Commons entities dumps.

I’m pretty sure it is running, but it only injects the datatype into claims/*, and as you mention, MediaInfo entities use statements/* instead (T149410). See also T246809: Inconsistencies between Wikidata and Structured Data about Snak's "datatype" from wbgetentities API results, which is the same issue as this but reported for wbgetentities (arguably you could merge these into one task, I think).

Oh, what a sad issue T149410. :-(

Anyway, this looks like it would be fixed by a simple fix of adding statements to globs here? Can I just go and submit a change adding that?

But I do not think fixing that would fix the wbgetentities API? Isn't JsonDataTypeInjector just for dumps?