Page MenuHomePhabricator

Investigate ways to reduce the size of translate-groups cache key
Closed, ResolvedPublic

Description

Split from T203786: Mcrouter periodically reports soft TKOs for mc1029 (was mc1035, mc1022) leading to MW Memcached exceptions.

There are (at least) two possible and non-exclusive approaches:

  • Look what is stored inside the key and see if it can be made smaller
  • Chunk the data into more keys

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Nikerabbit moved this task from Backlog to Translate twn on the User-Nikerabbit board.

@Nikerabbit if possible I'd ask to prioritize this, in T203786 Aaron keeps patching MWObjectCache but new corner cases keep emerging. A drastic reduction in the key size would help a lot :(

Nikerabbit triaged this task as High priority.Mar 6 2019, 7:57 AM
elukey awarded a token.Mar 6 2019, 7:57 AM
abi_ moved this task from Backlog to In Progress on the User-abi_ board.Mar 6 2019, 3:01 PM

FYI

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); echo serialize( $c->get( $c->makeKey( "translate-groups" ) ) );' | mwscript eval.php metawiki | wc -c
5278803
echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | mwscript eval.php metawiki
int(386936)

Latter number matches the value of "about 400k" for the size of the value in metawiki. This means that compression is already happening behind the scenes (surprise to me). Serialization is also implicitly happening, but that is not a surprise, though it is something that should be avoided (T161647: RFC: Deprecate using php serialization inside MediaWiki).

Change 495633 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache

https://gerrit.wikimedia.org/r/495633

abi_ moved this task from In Progress to Under Review on the User-abi_ board.EditedMar 11 2019, 6:18 AM
abi_ added a subscriber: abi_.

Following changes have been made in-order to reduce the size of the data saved in the cache,

  1. Save the title property in WikiPageMessageGroup as a string rather than a Title object.
  2. WikiPageMessageGroup class now implements the Serializable interface, and saves only the necessary properties in the cache.
  3. A version string has been added to the translate-groups key. This helps manage incompatible changes between data stored in the cache. The key will now be translate-groups-v2.

Few things to note,

  • Why add version to the key?

A version key is needed to avoid the warnings that appear when un-serialzing the data stored in the previous key (translate-groups).

The WikiPageMessageGroup class now implements the Serializable interface. The value stored before in the cache contains an instance of WikiPageMessageGroup that did not implement the Serializable interface. Hence when un-serializing the data, the unserialize function throws a warning which cannot be handled in a try / catch,

( ! ) Warning: Erroneous data format for unserializing 'WikiPageMessageGroup' in /vagrant/mediawiki/includes/objectcache/SqlBagOStuff.php on line 699
( ! ) Notice: unserialize(): Error at offset 182 of 15812 bytes in /vagrant/mediawiki/includes/objectcache/SqlBagOStuff.php on line 699

I created a minimal, complete and verifiable example showing this problem. Please find the attached zip file. Go through the README.md file inside it first.

Versioning is supported in WANObjectCache, but the version info is stored inside the cached data, hence it un-serializes the data to get the version info.

See documentation here.

Quote from the documentation,

New versions are stored alongside older versions concurrently. *Avoid storing class objects however, as this reduces compatibility (due to serialization)*

We are storing a class object in the cache so the version parameter doesn't work very well for us but since the new key is not present in cache there will be no warnings.


Below are the improvements in size after the changes made,

CompressedUncompressed
Without change125115760
With change10405146

These sizes are with 26 Pages and 4 Workflow States. Although the uncompressed space gains are close to 66%, the compressed space savings are around 16.86%. I cannot predict how much this will translate to on the server. Since the data on the server is compressed already, I do not expect a large reduction.


Testing

Switching between master branch and task branch

The majority of the testing for this task was done by reverting to the master branch, ensuring that the data in the cache is saved and then shifting to the task's branch and reloading the page. This will cause a cache regenerate to trigger and a new key - wiki:translate-groups:v2 will be generated using the code,

$cache->makeKey( 'translate-groups', 'v' . 2 ); // 2 is the cache version here

The page should load fine. If you check the data in the cache, it should have the data in the new format.

No warnings will appear since we are saving the data in a new key.

In addition to the above,

  1. Add some test cases to ensure that the serialization / un-serialization of the WikiPageMessageGroup class happens properly.
  2. Ran all the phpunit test cases under the Translate extension to ensure that they were passing.
abi_ claimed this task.Mar 11 2019, 6:21 AM
abi_ added a comment.Mar 12 2019, 3:07 PM

Updated previous comment with the latest changes made to the code after reviews from @Nikerabbit. (Thanks!)

@abi_ , @Nikerabbit Any plans to merge/deploy this? :)

Yes, I am planning to test and merge (if okay) before next train.

Change 495633 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache

https://gerrit.wikimedia.org/r/495633

abi_ added a comment.EditedMar 20 2019, 9:47 PM

Updated previous comment with the latest changes made to the code after reviews from @Nikerabbit. (Thanks!)

Updated again as per changes made based on feedback by Niklas.

On dev.translatewiki.net the size of the compressed value went from 99411 to 98583 which is about 1% decrease. This isn't very much, but then again probably 99% of groups there are not of type WikiPageMessageGroup.

Used this commands to get the sizes:

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | php eval.php

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups", "v2" ) ) ), -1 ) ) );' | php eval.php

Random example of the serialized value of a translatable page:
s:12:"page-Dadadad";C:20:"WikiPageMessageGroup":81:{{"title":"Dadadad","id":"page-Dadadad","_v":1,"label":"Dadadad","namespace":1198}}

As one quick win, I think we can move the assignment of $this->namespace to the class property instead of the constructor. Maybe we can also override the getLabel method to default to the page title to avoid storing that as well.

abi_ added a comment.Mar 28 2019, 7:53 AM

Hi Niklas,

Updated the patch as per suggestions. Example of a serialized value now,

s:12:"page-T218777";C:20:"WikiPageMessageGroup":46:{{"title":"T218777","id":"page-T218777","_v":1}}
  • Made namespace a class property, and set its value during declaration.
  • Added a getLabel method in WikiPageMessageGroup and derived its value from the page title.

That should help reduce the size more, but like you said on Translatewiki.net, this might not have a major impact.

On dev.translatewiki.net the size of the compressed value went from 99411 to 98583 which is about 1% decrease. This isn't very much, but then again probably 99% of groups there are not of type WikiPageMessageGroup.

Question from a very ignorant point of view - would it be feasible/useful to check serialized values in the metawiki translate-groups key to see if there are patterns of strings/etc.. that could be removed/reduced to allow reducing memory used?

Question from a very ignorant point of view - would it be feasible/useful to check serialized values in the metawiki translate-groups key to see if there are patterns of strings/etc.. that could be removed/reduced to allow reducing memory used?

That's what we are doing indeed. Metawiki consists of mainly WikiPageMessageGroup groups so the patch under review aims to shrink their size. Unfortunately, due to the compression, removing these repetitive things are also least likely to have a big impact on the compressed size.

Can you help us monitor the size of the value when the patch gets deployed? In earlier discussions it was about 400K for meta, but would also be good to know for mediawikiwiki and commonswiki at least.

elukey added a comment.EditedMar 28 2019, 3:41 PM

Can you help us monitor the size of the value when the patch gets deployed? In earlier discussions it was about 400K for meta, but would also be good to know for mediawikiwiki and commonswiki at least.

Please feel free to ping me anytime for any question, really glad to help if I can :)

elukey@mw1345:~$ echo "get WANCache:v:mediawikiwiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt
elukey@mw1345:~$ du -hs dump.txt
196K	dump.txt

elukey@mw1345:~$ echo "get WANCache:v:commonswiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt
elukey@mw1345:~$ du -hs dump.txt
56K	dump.txt

elukey@mw1345:~$ echo "get WANCache:v:metawiki:translate-groups" | nc localhost 11213 -q 2 > dump.txt
elukey@mw1345:~$ du -hs dump.txt
380K	dump.txt

I can get down also to what memcached internally stores, like:

ITEM WANCache:v:metawiki:translate-groups [388569 b; 1553362029 s]

The above output uses a special command of memcached to dump the contents of the slab. I think that the first method gives a good approximation about the size.

Wit the updated patch the compressed value size is down to 98245 from 98557 (originally 99411) on dev.translatewiki.net. Of course the change in uncompressed value is much bigger, though it has limited impact (possibly on memory use, (de)compression speed) and possibly on cache backends that don't compress values (if there are such?). Let's see how this affects wmf production.

@elukey FYI the patch changes the key from translate-groups to translate-groups:v2. I just +2ed it. I'm planning to let it roll with next train deployment.

@elukey FYI the patch changes the key from translate-groups to translate-groups:v2. I just +2ed it. I'm planning to let it roll with next train deployment.

Great! The key will be probably hashed to a different memcached shard by mcrouter, but it shouldn't be an issue, will check right after group 1 is deployed!

Change 495633 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Reduce amount of data stored in translate-groups cache

https://gerrit.wikimedia.org/r/495633

abi_ moved this task from Under Review to QA on the User-abi_ board.Mar 29 2019, 5:32 AM
abi_ removed a project: Patch-For-Review.

Moving to QA

nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups", "v2" ) ) ), -1 ) ) );' | mwscript eval.php --wiki=metawiki
int(261732)


nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( strlen( gzdeflate( serialize( $c->get( $c->makeKey( "translate-groups" ) ) ), -1 ) ) );' | mwscript eval.php --wiki=metawiki
int(389077)

That looks approximately 30% reduction in the compressed size.

I was inspecting to see how the contents look like now and it seems very weird:

s:25:"page-Article count reform";O:20:"WikiPageMessageGroup":12:{s:8:"*title";N;s:14:"*definitions";N;s:9:"*source";N;s:8:"*label";s:4:"none";s:5:"*id";s:4:"none";s:12:"*namespace";i:8;s:11:"*optional";a:0:{}s:10:"*ignored";a:0:{}s:14:"*description";N;s:7:"*meta";b:0;s:10:"*mangler";N;s:11:"*messages";a:0:{}}

This looks as if the serialize interface is ignored (Type O instead of type C like above). https://3v4l.org/Jehif shows that there shouldn't be an issue with HHVM vs. PHP on this. But this means further savings can still be seen if we figure out why this is happening. Currently the savings come from now having full title objects serialized.

So meta is still on 1.33.0-wmf.23 due to T220037. I am not sure whether the short time it was deployed was enough to create the translate-groups:v2 cache without title values, but wrong serialization format. Feels a bit weird that title is missing. Data on mediawiki.org looks okay though.

Agabi10 added a subscriber: Agabi10.Apr 4 2019, 7:17 AM
elukey added a comment.Apr 4 2019, 9:53 AM
elukey@mw1345:~$ echo "get WANCache:v:mediawikiwiki:translate-groups:v2" | nc localhost 11213 -q 2 > dump.txt
elukey@mw1345:~$ du -hs dump.txt
136K	dump.txt

Nice!

abi_ added a comment.Apr 4 2019, 1:01 PM

@elukey - Would it be possible to share the content of the file? Something similar to what Niklas pasted above?

s:25:"page-Article count reform";O:20:"WikiPageMessageGroup":12:{s:8:"*title";N;s:14:"*definitions";N;s:9:"*source";N;s:8:"*label";s:4:"none";s:5:"*id";s:4:"none";s:12:"*namespace";i:8;s:11:"*optional";a:0:{}s:10:"*ignored";a:0:{}s:14:"*description";N;s:7:"*meta";b:0;s:10:"*mangler";N;s:11:"*messages";a:0:{}}

We are specifically looking for *WikiPageMessageGroup* and the text that follows right after that.

This is an example from mediawiki.org:

s:25:"page-Manual:RunScript.php";C:20:"WikiPageMessageGroup":72:{{"title":"Manual:RunScript.php","id":"page-Manual:RunScript.php","_v":1}}
abi_ added a comment.Apr 4 2019, 5:56 PM

That serialized structure looks good now. Thanks.

abi_ moved this task from QA to Deployed / PROD QA on the User-abi_ board.Apr 6 2019, 3:15 PM

Since this task is deployed, moving it on the board to the appropriate status

The size on meta is now 302570 which is about 25% reduction.

@elukey Any thoughts whether this is sufficient? If not, I see possible follow-ups:

  1. Apply the same serialization format tweaking change to BannerMessageGroup
  2. Split the cache by origin or group type
  3. Explore whether Aaron's changes in core that allow automatic chunking can be used here
jijiki added a subscriber: jijiki.Apr 9 2019, 7:41 AM
elukey added a comment.Apr 9 2019, 7:47 AM

The size on meta is now 302570 which is about 25% reduction.
@elukey Any thoughts whether this is sufficient? If not, I see possible follow-ups:

  1. Apply the same serialization format tweaking change to BannerMessageGroup
  2. Split the cache by origin or group type
  3. Explore whether Aaron's changes in core that allow automatic chunking can be used here

@Nikerabbit let's see how it goes during the next days. First of all, a 25% reduction is an awesome result, thanks a lot for all the work done. Ideally if we could break down the key into smaller pieces rather than keeping a big blob it would be perfect, since we'd resolve the problem entirely (IIUC this is what you were proposing with 2. but I might be wrong).

If 1. is easy to implement I'd also be interested in seeing it applied, 3. seems to be a bit more cumbersome to maintain/debug when things are not working properly in my opinion.

@abi_ @Nikerabbit in T203786#5110373 I've done some analysis on bandwidth usage and the host hold the v2 key is definitely under pressure (less than it was before though). Do you guys have any more ideas about how to reduce the v2 key further? Thanks a lot and sorry for the unplanned work requests :(

abi_ added a comment.Apr 15 2019, 4:19 PM

Hi @elukey - Thanks for following up on this. I think the most straight forward change for now will be to apply the same serialization format to BannerMessageGroup and any MessageGroups that are big. This change will still be useful when we break up the key later on.

In the meantime, I will discuss with @Nikerabbit to identify how we can break up the key into smaller ones.

Thanks a lot!

abi_ moved this task from Deployed / PROD QA to Backlog on the User-abi_ board.Apr 18 2019, 8:46 AM
Elitre added a subscriber: Elitre.Apr 18 2019, 10:16 AM
abi_ moved this task from Backlog to In Progress on the User-abi_ board.Apr 23 2019, 6:22 AM

Change 506896 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Allow MessageGroups to be saved in separate cache key

https://gerrit.wikimedia.org/r/506896

abi_ added a comment.Apr 29 2019, 5:35 AM

Added a new interface - CustomCacheMessageGroup. This interface can be implemented by any message group that wishes to use a custom key in the cache. A new hook - TranslateCustomMessageGroup is triggered every time groups are requested from MessageGroups class. Message groups that implement the CustomCacheMessageGroup can register to this hook. Whenever a message group is requested, and the cache is valid, the cache key and version is fetched from the message group, to retrieve the data from the cache.

If there is no data in the cache, the cache re-generator calls methods in the message group to fetch the data, and then store it in the cache using the cache key and version.

This way each message group is in-charge of what data is stored in the cache, and the version of the data in the cache. If the class changes, and the cache needs to be invalidated, this can be done easily via the message group itself by updating the version. It also allows us to save different types of message groups in different keys which is probably the most sane way to break up the translate-groups key. Any new message group implemented by other extensions can use the same interface to have its data saved in a separate key.

Backward Compatibility


Why can't we serialize BannerMessageGroup with existing structure?

One of the things that we wanted to do was to manually serialize and save the BannerMessageGroup in the cache. Unfortunately this will not be possible to do using the same cache key, because the existing data in the cache is in a non-serialized format. This will cause an error to be thrown similar to what we were having with WikiPageMessageGroup

( ! ) Warning: Erroneous data format for unserializing 'WikiPageMessageGroup' in /vagrant/mediawiki/includes/objectcache/SqlBagOStuff.php on line 699

The best course of action would be to also update BannerMessageGroup to implement the new CustomCacheMessageGroup and implement the serialization also in the same extension update. This way the data will be stored in a new cache key. Following are a few scenarios,

1. New central notice extension with old Translate extension

With the old translate extension, the TranslateCustomMessageGroup hook will not trigger, while the old TranslatePostInit hook will be triggered retrieving Banner messages from the cache. The same cache key (translate-groups-v2) will be used to save data.

2. New Translate extension and old central notice extension

With the new translate extension and the old central notice extensions, the Banner messages will be returned on the TranslatePostInit hook since the BannerMessageGroup does not still implement the CustomCacheMessageGroup.

3. Upgrading from the old Translate extension to the new one

In this case we will have to add a cache version update on the new version, since we don't want the cache to have the WikiPageMessageGroup anymore which the old cache will have. WikiPageMessageGroup will end up being on a separate key.

4. Downgrading from the new Translate extension to the new one

In this case the extension will go back to using the old cache key. If the cache has not been invalidated yet, the newly created WikiPageMessageGroup's will not be present in the cache. I think this is an acceptable sacrifice as this should fix itself when the cache is regenerated.

abi_ moved this task from In Progress to Under Review on the User-abi_ board.Apr 29 2019, 5:46 AM
elukey added a comment.EditedApr 29 2019, 6:13 AM

@abi_ thanks a lot for working on this! I tried to check your code review but my PHP knowledge is limited :D Would you mind to make an example of how the metawiki translategroups key would be broken down into multiple keys? And also, what name/size would the new keys have?

abi_ added a comment.EditedApr 29 2019, 7:04 AM

Hey @elukey - Niklas will take a look at the patch when he returns.

The keys that we will now have will be something similar to,

  1. wiki:translate-groups:v3 - Similar to earlier, but version # has changed. Will store everything other than Wiki pages related info.
  2. wiki:translate-mg-wikipage:v1 - Will store all the Wiki pages related info.

These are the keys from my local, the prefix will be different on the MediaWiki server, probably - WANCache:v:mediawikiwiki:. Also do note that since the patch has to be reviewed the keys might change again.

Regarding the size, I cannot say. The wiki:translate-groups:v3 key should see a significant reduction, but I fear that we might end up making wiki:translate-mg-wikipage:v1 big. Relatively after change following should be the sizes,

  1. wiki:translate-groups:v3 - Entire key size - current size taken by Wiki page message group.
  2. wiki:translate-mg-wikipage:v1 - Current size taken by Wiki page message group.

I'm not sure how to calculate - Current size taken by Wiki page message group.

We might also end up having a separate key for Banner messages depending on what we observe once these changes have been deployed.

Thanks for the explanation. If I have understood correctly the Wiki page message groups will still be "packed" together in one big key, that will be separate from the main wiki:translate-groups. I think it could be a good next step forward, let's see how the new keys will look like. Ideally, long term, we should strive to have more "smaller" keys of fixed size rather than big ones that potentially keep growing and growing, but one step at the time :)

abi_ moved this task from Under Review to In Progress on the User-abi_ board.Apr 30 2019, 2:59 PM
abi_ added a comment.May 2 2019, 7:26 AM

Hi @elukey, Niklas has raised some valid arguments regarding my current implementation, as such we are going back to the drawing board to come up with a better approach that fixes the issue and reduces technical debt while not being too drastic. We will also have to think of an interface for other extensions such as CentralNotice and TranslateSvg to easily use the new improvements that we make. As such a fix for this will take some more time.

Change 508112 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Add loaders to load and cache MessageGroups separately

https://gerrit.wikimedia.org/r/508112

Change 506896 abandoned by Abijeet Patro:
Allow MessageGroups to be saved in separate cache key

Reason:
Added another patch based on feedback - I9dd37c92a7ab6d9c48334ac5a729824e9b2f77ab

https://gerrit.wikimedia.org/r/506896

abi_ moved this task from In Progress to Under Review on the User-abi_ board.May 12 2019, 5:25 PM

@abi_ Hi! Any news? :)

abi_ added a comment.May 13 2019, 9:04 AM

Hey @elukey - Yes, we've an updated patch for review which should offer significant reductions in size. We've broken up the cache based on the message group. We're no longer storing objects in the cache, instead storing the data that is needed to recreate the objects. For example, each WikiPageMessageGroup will only have its title stored in the cache (as string) rather than the object. Same has been done for other message groups.

The patch took longer as I fell sick for a couple of days last week but we're trying to have this read for the train tomorrow. I'll have further details for you by evening.

@abi_ you guys rock, thanks a lot for this effort!

abi_ added a comment.May 13 2019, 7:35 PM

Hey @elukey, I'd say we just finish adding the final touches. We'll review the patch once again tomorrow and decide if we can make it for the train tomorrow. I'll have more details regarding the number, names and content of keys for you tomorrow morning.

abi_ added a comment.May 14 2019, 8:53 AM

With the latest patch, the translate extension will use the following keys,

  1. wiki:translate-groups:v3 - Contains message groups implemented by other third party plugins such as the BannerMessageGroup by the central notice extension.
  2. wiki:translate-mg:aggregate:v1 - Contains the array based configuration for the AggregateMessageGroups in the instance.
  3. wiki:translate-mg:filebased:v1 - Contains the array based configuration for the FileBasedMessageGroups in the instance.
  4. wiki:translate-mg:wikipage:v1 - Contains an array of strings representing the titles used to recreate the WikiPageMessageGroups in the instance.

Earlier 2, 3 and 4 were part of the 1st key's cache. In addition, they cached serialized class objects, which have now been updated to store either array based configurations or strings.

Note that the wiki prefix maybe different based on the instance setup.

abi_ added a comment.May 17 2019, 3:42 PM

@elukey - We could not meet the train this week, but are confident of having the changes ready for the train next week.

I'm testing the patch (PS13) on translatewiki.net. If there are serious problems I'd expect to get reports of them before next train.

Thanks a lot for the feedback, and don't worry about the delay, better to be sure and test the code! One week more doesn't make a lot of difference :)

Change 508112 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add loaders to load and cache MessageGroups separately

https://gerrit.wikimedia.org/r/508112

What results are expected after this change gets deployed? (just to summarize what to look for / check)

abi_ added a comment.May 20 2019, 3:43 PM

With the latest patch, the translate extension will use the following keys,

  1. wiki:translate-groups:v3 - Contains message groups implemented by other third party plugins such as the BannerMessageGroup by the central notice extension.
  2. wiki:translate-mg:aggregate:v1 - Contains the array based configuration for the AggregateMessageGroups in the instance.
  3. wiki:translate-mg:filebased:v1 - Contains the array based configuration for the FileBasedMessageGroups in the instance.
  4. wiki:translate-mg:wikipage:v1 - Contains an array of strings representing the titles used to recreate the WikiPageMessageGroups in the instance.

Earlier 2, 3 and 4 were part of the 1st key's cache. In addition, they cached serialized class objects, which have now been updated to store either array based configurations or strings.
Note that the wiki prefix maybe different based on the instance setup.

@elukey - My comment above summarizes the changes that have been made.

@abi_ sorry I should have been more precise - I meant to ask if we have an idea about the size of those keys etc.. (you know these ops people always complaining about network usage :P). If we don't have a clear idea yet nevermind, I was only curious :)

Metawiki contains a large number of BannerMessageGroups (which will stay in the old aggregate key, just new version again), so there the splitting should actually have a considerable impact. For other wikis the most interesting question will be the size of the new wikipage key. It will be the new biggest key, even though we changed the format yet again to be simpler, but it's possible that due to compression this won't have much effect.

abi_ moved this task from Under Review to QA on the User-abi_ board.May 21 2019, 4:53 AM

Very crude initial check:

nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); $keys = [ "mediawikiwiki:translate-groups:v3", "mediawikiwiki:translate-mg:aggregate:v1", "mediawikiwiki:translate-mg:wikipage:v1" ]; foreach ( $keys as $key ) { var_dump( $key, strlen( gzdeflate( serialize( $c->get( $key ) ), -1 ) ) ); }' | mwscript eval.php mediawikiwiki
string(33) "mediawikiwiki:translate-groups:v3"
int(164)
string(39) "mediawikiwiki:translate-mg:aggregate:v1"
int(49926)
string(38) "mediawikiwiki:translate-mg:wikipage:v1"
int(50581)

I didn't check filebased because no such groups are configured on Wikimedia. This looks promising to me, given we started from about 200k. I am quite surprised that the aggregate key on mediawikiwiki is so big.

Same numbers for metawiki:

nikerabbit@deploy1001:~$ echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); $keys = [ "metawiki:translate-groups:v3", "metawiki:translate-mg:aggregate:v1", "metawiki:translate-mg:wikipage:v1" ]; foreach ( $keys as $key ) { var_dump( $key, strlen( gzdeflate( serialize( $c->get( $key ) ), -1 ) ) ); }' | mwscript eval.php metawiki
string(28) "metawiki:translate-groups:v3"
int(101169)
string(34) "metawiki:translate-mg:aggregate:v1"
int(77073)
string(33) "metawiki:translate-mg:wikipage:v1"
int(72466)

The largest key is now almost one fourth of the original size.

Nikerabbit added a comment.EditedMay 23 2019, 10:38 AM

Doing some general measurements:

  • We have 166 aggregate groups, with 77073 compressed bytes, making efficiency around 464 compressed bytes per group.
  • We have 5308 translatable pages, with 72466 compressed bytes, making efficiency around 14(!) compressed bytes per group
  • We have 2681 (1 aggregate, rest banners) using the old caching mechanism, with 101169 compressed bytes, making efficiency around 38 compressed bytes per group

My thoughts are:

  • We have succeeded in reducing the cache key sizes sufficiently (organic growth can happen for multiple years before issues surface) to consider this resolved. Please share your thoughts.
  • Converting BannerMessageGroup to the new caching mechanism would reduce the biggest key size (see notes below). This should be done regardless to stop serializing objects. I suggest to file a follow-up task to be prioritized separately with maintainers of that codebase.
  • We are still serializing DependencyWrapper objects. I suggest to file a follow-up task to stop doing that.
  • We can file a follow-up task for aggregate groups to support specifying the subgroups using a pattern. This is already possible for aggregate groups configured through yaml, but not for those configured through the wiki. This would both reduce key size and make their management easier.
  • Pre-serializing to a JSON string should be even more space efficient than the normal PHP serialization syntax for arrays. No idea about whether it is significantly slower or faster. A good thing would be that we wouldn't accidentally serialize objects, as they would not round-trip. But maybe it is not worth it.
  • We still invalidate all keys when any change happens. I suggest a follow-up task for more granular invalidation and regeneration of caches.

Notes:

Dumping

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); var_dump( $key, serialize( $c->get( "metawiki:translate-groups:v3" ) ) );' | mwscript eval.php metawiki > dump-groups

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); echo json_encode( $c->get( "metawiki:translate-mg:aggregate:v1" ), JSON_PRETTY_PRINT );' | mwscript eval.php metawiki > dump-agg

echo '$c = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); echo serialize( $c->get( "metawiki:translate-mg:wikipage:v1" ) );' | mwscript eval.php metawiki > dump-wikipage

BannerMessageGroup is currently serialized as:

O:18:"BannerMessageGroup":11:{s:13:"^@*^@bannerName";s:24:"Genericmaintenancenotice";s:12:"^@*^@namespace";i:866;s:9:"^@*^@source";N;s:8:"^@*^@label";s:57:"MediaWiki:Centralnotice-template-Genericmaintenancenotice";s:5:"^@*^@id";s:45:"Centralnotice-tgroup-Genericmaintenancenotice";s:11:"^@*^@optional";a:0:{}s:10:"^@*^@ignored";a:0:{}s:14:"^@*^@description";N;s:7:"^@*^@meta";b:0;s:10:"^@*^@mangler";N;s:11:"^@*^@messages";a:0:{}}s:34:"Centralnotice-tgroup-newtestbanner";

Compare this with wikipage serialization:

i:52;s:30:"Category:Wikimedia Deutschland"

I think we can get close to efficiency of translatable pages – since all those property values can either use default or derived from single name – maybe even halving the size.

abi_ added a comment.May 23 2019, 11:41 AM

@Nikerabbit - Thanks for that assessment.

Converting BannerMessageGroup to the new caching mechanism would reduce the biggest key size (see notes below). This should be done regardless to stop serializing objects. I suggest to file a follow-up task to be prioritized separately with maintainers of that codebase.

Agreed, I will do that.

We are still serializing DependecyWrapper objects. I suggest to file a follow-up task to stop doing that.

So in this case we would store the dependencies (Globals, FileDependency) and the values in the cache, and rebuild the DependencyWrapper object after fetching the data from the cache?

We can file a follow-up task for aggregate groups to support specifying the subgroups using a pattern. This is already possible for aggregate groups configured through yaml, but not for those configured through the wiki. This would both reduce key size and make their management easier.

I don't understand what this means. Currently on my local, aggregate message group is stored as following,

>>> $c->get ( $c->makeKey( 'translate-mg', 'aggregate', 'v' . 1 ) )
=> [
     "agg-hello" => [
       "BASIC" => [
         "id" => "agg-hello",
         "label" => "hello",
         "description" => "hello",
         "meta" => 1,
         "class" => "AggregateMessageGroup",
         "namespace" => 1198,
       ],
       "GROUPS" => [
         "page-T138619 2",
         "page-T204568",
       ],
     ],
   ]

The subroups are already stored as an array of strings.

Pre-serializing to a JSON string should be even more space efficient than the normal PHP serialization syntax for arrays. No idea about whether it is significantly slower or faster. A good thing would be that we wouldn't accidentally serialize objects, as they would not round-trip. But maybe it is not worth it.

True. I think this might shave of a 7-10 characters for every page (in-case of wikipage), which in turn will further the amount of time after which we have to re-look at this issue. I found this StackOverflow thread and this test on 3v4l.org that goes into some performance related concerns that we might have.


I will create 4 tasks,

  1. Convert BannerMessageGroup to the new caching mechanism
  2. Avoid storing DependecyWrapper objects in the cache
  3. Add support for Aggregate groups to specify the subgroups using a pattern. (Not yet sure what this entails)
  4. Pre-serialize array to JSON before saving into cache

The subroups are already stored as an array of strings.

My idea was that we would store in GROUPS or another key:

=> [
     "agg-hello" => [
       "BASIC" => [
         "id" => "agg-hello",
         "label" => "hello",
         "description" => "hello",
         "meta" => 1,
         "class" => "AggregateMessageGroup",
         "namespace" => 1198,
       ],
       "GROUPS" => [
         "page-T*,
       ],
     ],
   ]

This would mean that the subgroups of the aggregate group would be dynamic, evaluated during runtime. This means that the aggregate group cache needs to be purged when new pages are marked for translation or removed from translation. It also simplifies the job of a translation admin who doesn't need to go add every subpage to the aggregate group separately.

Results are amazing, thanks a lot for this work! I checked the size of the new keys and the maximum one is ~100K now for metawiki!

I am still seeing timeouts to memcached though (sigh), will need a bit more time to check if they are related to a new source of problem (not language related). From a quick analysis it seems so, but I'd like to wait a bit before calling this done done done if you guys don't mind.

I am currently at the Analytics offsite so my answer may lag a bit, apologies :(

elukey added a comment.Jun 5 2019, 7:00 AM

I tried to check the source of the timeouts in mcrouter's graph but they are not anymore concentrated to one shard, like it was when the translate groups 400K big key was still used. Until I have more data about the source of the remaining timeouts, please go ahead and declare this task resolved. There are a lot of follow ups already created as tasks, plus you guys already did an incredible job in reducing the key size.

Thanks again for all the patience and hours spent on this!

abi_ closed this task as Resolved.Jun 12 2019, 3:44 AM
abi_ moved this task from QA to Done on the User-abi_ board.

Thanks @elukey

As per the comments above, we're going to mark this as done. Subsequent improvements will be carried out on the child tasks created.