Page MenuHomePhabricator

Publish WDQS JNL files to dumps.wikimedia.org
Open, LowPublic

Description

As requested I'm creating this ticket to focus discussion around the possibility of providing these JNL files to dumps.wikimedia.org
And also to appeal to the folks that find these JNL files useful to see if it is worth the effort.

Reading from my experiments:

More content to come...

Event Timeline

@dr0ptp4kt and I were looking at this today and it occurred to me that the JNL file is uncompressed.

Thus, I gzipped the main wikidata JNL file from wdqs1016, which takes ~4 hours using pigz at maximum compression rate with all (32) cores, and we end up with a ~400 GB file compared to 1.2 TB uncompressed.

Leaving this here in case it's relevant to the dumps discussion.

Adding some usage numbers to this task.
Of the JNL files that I am currently hosting on cloudflare, 4.28 TB traffic has been used in the past 30 days, which equates to roughly 3-4 downloads of the file.

I think the ammount of time taken to decompress the JNL file should also be taken into consideration on varying hardware if compression is being considered.

Gehel triaged this task as Low priority.Oct 11 2023, 8:37 AM
Gehel moved this task from Incoming to Misc on the Data-Platform-SRE board.

I think the ammount of time taken to decompress the JNL file should also be taken into consideration on varying hardware if compression is being considered.

Closing the loop, posted my experience at T347605#9229608.

This needs to be driven by a product need. Removing DPE-SRE from this ticket until things are moving and we are needed.