Since Wikidata RDF ontology is not "beta" anymore, it's time to remove BETA marker from RDF dumps. The name is now e.g. wikidata-20190617-all-BETA.ttl.bz2 but should be just wikidata-20190617-all.ttl.bz2.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +3 -3 | Remove BETA from RDF dump filenames |
Related Objects
Event Timeline
I think we need to drop a note to wikidata-l, maybe also add something to Weekly notes (@Lea_Lacroix_WMDE ?). Not sure what else.
Because this affects downloaders, might as well blast xmldatadumps-l and tbh I would forward to wikitech-l too.
I take care of Wikidata newsletter and TechNews. Any idea when this change will take place?
What do people think of a July 29 deadline (the start of that run)? Unfortunately we can't really do a 1st of the month change.
Change 518108 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Remove BETA from RDF dump filenames
Thanks for the ping! I don't use RDF dumps at the moment, and I'm fine with this change.
Oh, July... Somehow I've read that as "June". Maybe a bit earlier? Couple of weeks should be enough for preparing the software...
OK, let's go for July 15th then, again between runs. How does that sound? (But let's make sure that date is announced everywhere.)
@ArielGlenn just to be sure, are you going to rename only the new dumps to come, or also the previous ones?
Announced ✅ on Wikidata, on the wikidata, wikidata-tech, wikitech-l, xmldatadumps-l mailing-lists, on Weekly Summary and TechNews.
Change 518108 merged by ArielGlenn:
[operations/puppet@production] Remove BETA from RDF dump filenames
While this issue is supposed to be closed, one can still see at https://dumps.wikimedia.org/wikidatawiki/entities/20210628/ a "-all-BETA" dumps (in .nt and .ttl formats) and a -all.json format dump. Is it normal? Can you please confirm that the content of those dumps is the same except for the serialization format?
Hm, I think there’s two different things here.
- It looks like we removed the “-BETA” from the name of the latest dumps (e.g. latest-all.ttl.gz), but not from the timestamped ones (e.g. wikidata-20210628-all-BETA.ttl.gz). This wasn’t mentioned in the announcement, so I don’t think it’s intentional, and we probably want to fix it.
- @Rtroncy, I’m not sure what you mean by the same content, but as far as I’m aware, we don’t guarantee any atomicity for those dumps, neither within a dump nor between them. Since the .nt, .ttl and .json dumps are created independently (as far as I know), they probably don’t quite contain the same data, because Wikidata edits continue while the dumpers are working. Does that answer your question?
Thanks for the clarifications, this does perfectly answer my questions. I would consider though that the differences between the different formats of the dumps are minor, even if the processes are independent but this is indeed interesting to highlight, I don't think many people are aware of this.