Since Wikidata RDF ontology is not "beta" anymore, it's time to remove BETA marker from RDF dumps. The name is now e.g. wikidata-20190617-all-BETA.ttl.bz2 but should be just wikidata-20190617-all.ttl.bz2.
While this issue is supposed to be closed, one can still see at https://dumps.wikimedia.org/wikidatawiki/entities/20210628/ a "-all-BETA" dumps (in .nt and .ttl formats) and a -all.json format dump. Is it normal? Can you please confirm that the content of those dumps is the same except for the serialization format?
Hm, I think there’s two different things here.
- It looks like we removed the “-BETA” from the name of the latest dumps (e.g. latest-all.ttl.gz), but not from the timestamped ones (e.g. wikidata-20210628-all-BETA.ttl.gz). This wasn’t mentioned in the announcement, so I don’t think it’s intentional, and we probably want to fix it.
- @Rtroncy, I’m not sure what you mean by the same content, but as far as I’m aware, we don’t guarantee any atomicity for those dumps, neither within a dump nor between them. Since the .nt, .ttl and .json dumps are created independently (as far as I know), they probably don’t quite contain the same data, because Wikidata edits continue while the dumpers are working. Does that answer your question?
Thanks for the clarifications, this does perfectly answer my questions. I would consider though that the differences between the different formats of the dumps are minor, even if the processes are independent but this is indeed interesting to highlight, I don't think many people are aware of this.