Page MenuHomePhabricator

Wikistats: Consider compressing with bzip2 (or even 7zip)
Closed, DeclinedPublic

Description

Author: Wiki.Melancholie

Description:
Consider compressing logs (dumps at http://dammit.lt/wikistats/) with bzip2 (or even 7zip) instead of gzip.

There would be a reduction of up to 25 % (with 7z: another 5% compared to bzip2), disk space and traffic!


Version: unspecified
Severity: enhancement
URL: http://dammit.lt/wikistats/

Details

Reference
bz15623

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:23 PM
bzimport set Reference to bz15623.
bzimport added a subscriber: Unknown Object (MLST).

Wiki.Melancholie wrote:

Please pre-announce, so that Henrik and Erik can be informed.

Command:
tar -cjf

[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]

(In reply to comment #3)

A nice comparison, by the way:

Better comparison (on actual data) copied from https://wiki.toolserver.org/view/Talk:User-store :

A huge portion of the space is taken by visitors stats, although now they have two mirrors (WMF and IA). The oldest ones are compressed in LZMA (xz). Compressing gz or xz is useless, can only increase size. I made some tests of compression of a whole month uncompressed, 2011-03-pagecounts (184G):

7z a -t7z -m0=BZip2 -mmt=6 -mx9 takes ~27h (6 cores, less than 100M memory) and gives 41G
7z a -t7z -m0=LZMA -mmt=on -mx9 -md=64m -mfb=64 takes ~56h (2 cores, 800M memory) and gives 37G
7z a -t7z -m0=LZMA -mmt=on -mx9 -md=256m -mfb=64 -ms=on takes about 3 days (2 cores, 2700M memory) and gives 35G
tar with xz uses LZMA with standard settings and can only give worse results (I tried it but it got killed by mistake, wasn't going anywhere though)
individual gz are 51.2G
individual xz of this month are not yet available for comparison

--Nemo 10:23, 22 March 2012 (UTC)