Request: Currently the JSON dumps are compressed using gzip. I propose to also provide a file compressed with bzip2.
Reason: I've been working with people who would like to process WikiData stuff in Hadoop/Spark. Inside of that environment, bzip2 is better supported than gzip because of the block compression strategy that it uses. Currently, we need to recompress json dumps in order to take full advantage of these distributed processing frameworks. It would be very helpful for us and our workflow if the dumps could be provided in a bzip2 compressed format.
See http://stackoverflow.com/questions/6511255/why-cant-hadoop-split-up-a-large-text-file-and-then-compress-the-splits-using-g and http://stackoverflow.com/questions/14820450/best-splittable-compression-for-hadoop-input-bz2.