Page MenuHomePhabricator

Provide a compression+decompression library
Open, MediumPublicFeature

Description

Feature summary:
Compress and decompress stuff.

Use case(s):
Several userscripts of mine store data. As localStorage is limited and MediaWiki preferences even more so, compression is helpful.

Benefits:
Store more with less.

Currently we have mediaWiki.deflate (mw.loader.using('mediawiki.deflate',function(){console.log(mediaWiki.deflate('testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest'))});), see T236210. But the corresponding decompression function is only available on the server side as far as I can tell.

Could the corresponding inflate function be provided with JS as well? (it's fine if the mw.loader.using triggers a download for it) Another option is Pieroxy's lz-string which is about 3K but doesn't seem to compress quite as well. Compressing $('body')[0].innerHTML of enwiki's WP:VPT the character length of the compressed output is 1.9 times larger than mediawiki.deflate when using lz-string in base64 mode.

This should be relatively straightforward I'd suspect?

If I had a testing environment I could almost do it myself, emphasis on almost.

Beware that MediaWiki uses the outdated pako 1.0.10 deflate, so either use inflate from 1.0.10 or upgrade both. The current version doesn't work as a drop-in replacement though. They probably changed the parameter format. Update: now uses 2.0.4.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Removing good first task as the task description currently offers several options if I understasnd correctly, and as this probably should receive some team input first.

Removing good first task as the task description currently offers several options if I understasnd correctly, and as this probably should receive some team input first.

That's fair. The team should probably look at it first. I added lz-string just to be comprehensive. It's very tiny: 1.4K after min and gzip versus 14K for pako. I made a little benchmark to test the two:

mw.loader.load('//en.wikipedia.org/w/index.php?title=User:Alexis Jazz/lz-string/bench.js&action=raw&ctype=text/javascript');

lz-string turns out to be several times slower than pako while often having a worse compression ratio. Still impressive given the library size which can still make it interesting in some cases, but I'm fairly sure now that in the scope of MediaWiki it'll make more sense to provide pako's inflate.

Krinkle moved this task from Inbox to Accepted Enhancement on the MediaWiki-ResourceLoader board.
Krinkle added a subscriber: Krinkle.

I accept this feature request and consider it in-scope and well-fit into the set of built-in modules we offer. Having said that, picking and testing an implementation for this isn't something I can currently squeeze in unless there's an immediate use case that needs this as part of an approved goal.

As a workaround, you can embed pako.js in your gadget directly, or e.g. use the (currently Chrome-only) Compression Streams API which includes both inflate and deflate.

Alternatively, I would accept a patch to help resolve T235237. E.g. by adding a CompressionStream polyfil ("web2022-polyfills"?) that could be backed by pako.js. It might not be worth it to bundle-split and register inflate/deflate as separate bundles. We can measure how large the two pako files are when combined, minified and gziped together in one response vs separately how well they compress together, and thus how much we effectively add to the transfer size in Firefox/Safari until they ship support for the API.

Krinkle triaged this task as Medium priority.Jul 11 2022, 6:56 PM
Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

I accept this feature request and consider it in-scope and well-fit into the set of built-in modules we offer. Having said that, picking and testing an implementation for this isn't something I can currently squeeze in unless there's an immediate use case that needs this as part of an approved goal.

What is an "approved goal" in this context?

By the looks of the ongoing RfC, EditNoticesOnMobile (T312299) is about to get deployed on English Wikipedia as a default Minerva gadget. It could have taken advantage of this, instead it uses lz-string now. If it does switch to pako it'll include both the inflate and deflate modules from pako to avoid the risk of becoming desynchronized with mw.deflate.

As a workaround, you can embed pako.js in your gadget directly

For my other gadget (Bawl) that could hugely profit from this (its need for good compression is even greater) I'm planning to replace lz-string with pako's inflate while relying on mw.deflate for the compression. This is risky, but Bawl is not (yet) a default gadget. And I'm not thrilled about adding ~8K (after min and gzip) to Bawl. Would be even less thrilled about adding ~14K for both the inflate+deflate modules. I might end up having to do that anyway, need to look into that CompressionStream thing. Oh well.

, or e.g. use the (currently Chrome-only) Compression Streams API which includes both inflate and deflate.

Yeah, Chrome-only, that's not going to be acceptable.

Alternatively, I would accept a patch to help resolve T235237. E.g. by adding a CompressionStream polyfil ("web2022-polyfills"?) that could be backed by pako.js. It might not be worth it to bundle-split and register inflate/deflate as separate bundles. We can measure how large the two pako files are when combined, minified and gziped together in one response vs separately how well they compress together, and thus how much we effectively add to the transfer size in Safari/Firefox until they ship support for the API.

What about IE? *takes off clown nose*

For pako the inflate module is nearly 8K after min+gzip, nearly 9K for the deflate module (nearly 17K total) or 14K for the combined module. In its current form the cost of the deflate module is ~11K because MediaWiki doesn't use the pre-minified module as provided by the pako project.

So providing both would cost ~3K more than the current situation or ~5K more if MediaWiki had used the pre-minified version from the start. Whether that's acceptable is hard for me to judge.

<s>Don't know if this helps at all, but it works..</s> Nope it's broken, base64 breaks on 1+byte characters.

var pako = require( '../../lib/pako/pako.es5.min.js' );
mw.inflate = function ( data, dataArr, int ) { // Written by Alexis Jazz, WTFPL v2
	if ( data.slice(0,11) != 'rawdeflate,' ) { //assume uncompressed, return input (IE sux)
		return data;
	}
	data = atob(data.slice(11));
	dataArr = [];
	for(int=0;int<data.length;int++) {
		dataArr.push(data.charCodeAt(int));
	}
	dataArr = Uint8Array.from(dataArr);
	return new TextDecoder().decode(pako.inflateRaw(dataArr));
};

Edit: hum, IE doesn't support TextDecoder. Every other browser does, just not IE. How extremely sad.

Possible solutions:

  1. Screw IE
  2. Add check to mw.deflate to see if TextDecoder is supported. If not, return uncompressed input. Done. Your life will suck, but you're using IE so it already sucks anyway.
  3. Screw IE
  4. Change code to something presumably way more complicated that does work in IE.
  5. Provide lz-string as a fallback
  6. Screw IE

Edit 2:

mw.deflate <s>can be written the same way, far more simple than it is now:</s> also broken

mw.deflate = function ( data, dataArr, int ) { // Written by Alexis Jazz, WTFPL v2
	if ( typeof TextDecoder != 'function' ) { // IE sux
		return data;
	}
	return 'rawdeflate,'+btoa(new TextDecoder('ISO-8859-1').decode(pako.deflateRaw(data)));
};

Edit: per https://www.mediawiki.org/wiki/Compatibility/IE11 the code above is actually totally acceptable. It shouldn't even break in IE, it just won't compress anything.

Sidenote: you can hack ~400 bytes (AFTER min+gzip, some 1.1K before) out of pako's inflate module with this. I know, because I did. It contains a bunch of code for browsers without TextEncoder/TextDecoder that gets skipped in any browser that isn't a clunker. Similar code is found in the pako's deflate module.

Edit: struck text to indicate this is broken

Pako (combined deflate+inflate version), stripped support for browsers without TextEncoder/TextDecoder:

If you add the mw.inflate/mw.deflate functions I wrote above to this the total is <2K bigger (after min+gzip) than the current mw.deflate module which only provides the deflate side!

Edit: according to https://analytics.wikimedia.org/dashboards/browsers/#desktop-site-by-browser the market share for IE on all sites is just 0.3%. On the desktop site it's 0.7%. That's not an awful lot but I suspect those are generally not editors, just people looking things up at work and maybe grandpops on his Windows 7 desktop. IMHO dropping compression support within VE (if you're using IE11 to edit Wikimedia, would you REALLY want to use VE?), not even breaking VE, to save a bit on module size is a more than fair tradeoff IMHO. You're already in the process of downgrading IE11 to grade C anyway.