There are (at least) two possible and non-exclusive approaches:
- Look what is stored inside the key and see if it can be made smaller
- Chunk the data into more keys
• Nikerabbit | |
Jan 15 2019, 9:59 AM |
F28375788: serialization-test.zip | |
Mar 12 2019, 2:36 PM |
There are (at least) two possible and non-exclusive approaches:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | abi_ | T213802 Investigate ways to reduce the size of translate-groups cache key | |||
Open | None | T224644 Avoid storing DependecyWrapper objects in the cache | |||
Open | None | T224645 Convert BannerMessageGroup to the new caching mechanism | |||
Open | None | T224646 Pre-serialize array to JSON before saving into cache | |||
Open | None | T224647 Add support for Aggregate groups to specify the subgroups using a pattern |
@Nikerabbit if possible I'd ask to prioritize this, in T203786 Aaron keeps patching MWObjectCache but new corner cases keep emerging. A drastic reduction in the key size would help a lot :(
FYI
echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); echo serialize( $c->get( $c->makeKey( "translate-groups" ) ) );' | mwscript eval.php metawiki | wc -c 5278803 echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | mwscript eval.php metawiki int(386936)
Latter number matches the value of "about 400k" for the size of the value in metawiki. This means that compression is already happening behind the scenes (surprise to me). Serialization is also implicitly happening, but that is not a surprise, though it is something that should be avoided (T161647: RFC: Deprecate using php serialization inside MediaWiki).
Change 495633 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache
Following changes have been made in-order to reduce the size of the data saved in the cache,
Few things to note,
A version key is needed to avoid the warnings that appear when un-serialzing the data stored in the previous key (translate-groups).
The WikiPageMessageGroup class now implements the Serializable interface. The value stored before in the cache contains an instance of WikiPageMessageGroup that did not implement the Serializable interface. Hence when un-serializing the data, the unserialize function throws a warning which cannot be handled in a try / catch,
( ! ) Warning: Erroneous data format for unserializing 'WikiPageMessageGroup' in /vagrant/mediawiki/includes/objectcache/SqlBagOStuff.php on line 699 ( ! ) Notice: unserialize(): Error at offset 182 of 15812 bytes in /vagrant/mediawiki/includes/objectcache/SqlBagOStuff.php on line 699
I created a minimal, complete and verifiable example showing this problem. Please find the attached zip file. Go through the README.md file inside it first.
Versioning is supported in WANObjectCache, but the version info is stored inside the cached data, hence it un-serializes the data to get the version info.
See documentation here.
Quote from the documentation,
New versions are stored alongside older versions concurrently. *Avoid storing class objects however, as this reduces compatibility (due to serialization)*
We are storing a class object in the cache so the version parameter doesn't work very well for us but since the new key is not present in cache there will be no warnings.
Below are the improvements in size after the changes made,
Compressed | Uncompressed | |
---|---|---|
Without change | 1251 | 15760 |
With change | 1040 | 5146 |
These sizes are with 26 Pages and 4 Workflow States. Although the uncompressed space gains are close to 66%, the compressed space savings are around 16.86%. I cannot predict how much this will translate to on the server. Since the data on the server is compressed already, I do not expect a large reduction.
The majority of the testing for this task was done by reverting to the master branch, ensuring that the data in the cache is saved and then shifting to the task's branch and reloading the page. This will cause a cache regenerate to trigger and a new key - wiki:translate-groups:v2 will be generated using the code,
$cache->makeKey( 'translate-groups', 'v' . 2 ); // 2 is the cache version here
The page should load fine. If you check the data in the cache, it should have the data in the new format.
No warnings will appear since we are saving the data in a new key.
In addition to the above,
Updated previous comment with the latest changes made to the code after reviews from @Nikerabbit. (Thanks!)
Change 495633 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache
On dev.translatewiki.net the size of the compressed value went from 99411 to 98583 which is about 1% decrease. This isn't very much, but then again probably 99% of groups there are not of type WikiPageMessageGroup.
Used this commands to get the sizes:
echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | php eval.php echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups", "v2" ) ) ), -1 ) ) );' | php eval.php
Random example of the serialized value of a translatable page:
s:12:"page-Dadadad";C:20:"WikiPageMessageGroup":81:{{"title":"Dadadad","id":"page-Dadadad","_v":1,"label":"Dadadad","namespace":1198}}
As one quick win, I think we can move the assignment of $this->namespace to the class property instead of the constructor. Maybe we can also override the getLabel method to default to the page title to avoid storing that as well.
Hi Niklas,
Updated the patch as per suggestions. Example of a serialized value now,
s:12:"page-T218777";C:20:"WikiPageMessageGroup":46:{{"title":"T218777","id":"page-T218777","_v":1}}
That should help reduce the size more, but like you said on Translatewiki.net, this might not have a major impact.
Question from a very ignorant point of view - would it be feasible/useful to check serialized values in the metawiki translate-groups key to see if there are patterns of strings/etc.. that could be removed/reduced to allow reducing memory used?
That's what we are doing indeed. Metawiki consists of mainly WikiPageMessageGroup groups so the patch under review aims to shrink their size. Unfortunately, due to the compression, removing these repetitive things are also least likely to have a big impact on the compressed size.
Can you help us monitor the size of the value when the patch gets deployed? In earlier discussions it was about 400K for meta, but would also be good to know for mediawikiwiki and commonswiki at least.
Please feel free to ping me anytime for any question, really glad to help if I can :)
elukey@mw1345:~$ echo "get WANCache:v:mediawikiwiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt elukey@mw1345:~$ du -hs dump.txt 196K dump.txt elukey@mw1345:~$ echo "get WANCache:v:commonswiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt elukey@mw1345:~$ du -hs dump.txt 56K dump.txt elukey@mw1345:~$ echo "get WANCache:v:metawiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt elukey@mw1345:~$ du -hs dump.txt 380K dump.txt
I can get down also to what memcached internally stores, like:
ITEM WANCache:v:metawiki:translate-groups [388569 b; 1553362029 s]
The above output uses a special command of memcached to dump the contents of the slab. I think that the first method gives a good approximation about the size.
Wit the updated patch the compressed value size is down to 98245 from 98557 (originally 99411) on dev.translatewiki.net. Of course the change in uncompressed value is much bigger, though it has limited impact (possibly on memory use, (de)compression speed) and possibly on cache backends that don't compress values (if there are such?). Let's see how this affects wmf production.
@elukey FYI the patch changes the key from translate-groups to translate-groups:v2. I just +2ed it. I'm planning to let it roll with next train deployment.
Great! The key will be probably hashed to a different memcached shard by mcrouter, but it shouldn't be an issue, will check right after group 1 is deployed!
Change 495633 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache
nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups", "v2" ) ) ), -1 ) ) );' | mwscript eval.php --wiki=metawiki int(261732) nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | mwscript eval.php --wiki=metawiki int(389077)
That looks approximately 30% reduction in the compressed size.
I was inspecting to see how the contents look like now and it seems very weird:
s:25:"page-Article count reform";O:20:"WikiPageMessageGroup":12:{s:8:"