Page MenuHomePhabricator

Use static php array files for l10n cache instead of CDB
Open, HighPublic

Description

Facebook's Fred Emmott works on benchmarking HHVM's performance when running various open-source PHP frameworks. This puts him in contact with MediaWiki's codebase. He wrote in to suggest that we experiment with using plain PHP files instead of CDB for the l10n cache. We should try that and see whether it improves performance.


Deployment plan, as recycled from 2015 (T99740#5165753 by @Krinkle):

  1. Enable array format on testwiki on Beta Cluster. (We used test2wiki in 2015, but I recommend we use testwiki this time, so that Beta Cluster gets it as well) – Similar to 7237148affb4 / https://gerrit.wikimedia.org/r/217702.
  2. Have Scap include testwiki as extra element in its loop over "pick a wiki per wiki version" for its invoking of _call_rebuildLocalisationCache. See also T105683, and https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/scap/+/224520/5/scap/tasks.py for related changes that might be needed.
  3. Confirm it works in Beta Cluster.
  4. Enable array format for all wikis in Beta, except testwiki (inverse of before), and run that for a week or two to confirm there's no major issues with HHVM (or PHP 7).
  5. Enable array format for testwiki in production.
  6. (Temporary) Add magic switch to wmf-config to use array on any wiki, if and only if the array cache file is locally found. This is for during the transition. See also c9ff83c234f53 / https://gerrit.wikimedia.org/r/224562.
  7. Enable array for all wikis.
  8. Remove temporary switch.
  9. Remove extra element from Scap loop for rebuildLocalisationCache.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fred added a comment.Jul 8 2015, 9:26 PM

Is there any data yet from wikipedia deployments?

Reedy added a comment.Jul 8 2015, 11:49 PM

Is there any data yet from wikipedia deployments?

Nope... I believe @ori tested it and had to revert it because of the php l10n "cache" not being in place

I was talking to him about it before, and we need to do a few short term hacks to our localisation building code to have it generate both versions

greg added a subscriber: greg.Aug 10 2015, 9:47 PM

Another potentially interesting alternative to cdb might be http://symas.com/mdb/, an embedded b+ tree database with zero-copy lock-free reads. There is a patch for a HHVM extension at https://github.com/yakaz/hhvm/commit/6941d790afd626c6d53b38cf173b2221bfc371f6.

thcipriani moved this task from To Triage to In-progress on the Deployments board.Aug 12 2015, 4:05 PM
Krinkle closed this task as Resolved.Sep 4 2015, 2:49 AM
Krinkle claimed this task.
Krinkle added a subscriber: Krinkle.

Per T108638, this seems resolved.

EBernhardson added a subscriber: EBernhardson.EditedDec 19 2015, 4:29 AM

Since this is resolved, what was the result of the experiment? The current CDB's look to be ~2.7GB so blowing out the translation cache sounds likely?

Since this is resolved, what was the result of the experiment? The current CDB's look to be ~2.7GB so blowing out the translation cache sounds likely?

T103886

And as such, we're waiting on T119637 for a newer HHVM version to try again

Krinkle removed Krinkle as the assignee of this task.Jan 6 2016, 7:58 PM
hashar added a subscriber: hashar.Feb 16 2016, 7:48 PM
Gilles raised the priority of this task from Low to High.Dec 7 2016, 7:09 PM
Gilles moved this task from Backlog: Small & Maintenance to Doing on the Performance-Team board.
Gilles moved this task from Doing to Blocked or Needs-CR on the Performance-Team board.

Since this is resolved, what was the result of the experiment? The current CDB's look to be ~2.7GB so blowing out the translation cache sounds likely?

T103886: Translation cache exhaustion caused by changes to PHP code in file scope
And as such, we're waiting on T119637 for a newer HHVM version to try again

This issue was fixed in July 2016, released in HHVM 3.15.0 - which we've already upgraded to, and beyond (currently: HHVM 3.18)

Let's try this again?

demon added a subscriber: demon.Jun 27 2017, 10:24 PM

Yes, please. Let's.

Change 414865 had a related patch set uploaded (by Chad; owner: Chad):
[operations/mediawiki-config@master] Beta: Attempt using LCStoreStaticArray

https://gerrit.wikimedia.org/r/414865

Change 414865 merged by jenkins-bot:
[operations/mediawiki-config@master] Beta: Attempt using LCStoreStaticArray

https://gerrit.wikimedia.org/r/414865

hashar removed a subscriber: hashar.Feb 27 2018, 9:31 AM
Seb35 added a subscriber: Seb35.Sep 23 2018, 10:39 PM

I tried this option. I didn’t benchmark but I noticed the files are quite big – as noted in rMWc403d4838dc86.

I was wondering why the serialisation format encoded the scalar values as [ 0 => 'v', 1 => 'value' ], it does not seem necessary. I tried by simply encoding scalar values as the value alone. It works equally well and the files sizes dropped to values slightly smaller than CDB files: for English, 874 Kio for this slim PHP version, 1.3 Mio for the current PHP version, 980 Kio for the CDB version.

Change 462326 had a related patch set uploaded (by Seb35; owner: Seb35):
[mediawiki/core@master] Slimer PHP localisation cache files

https://gerrit.wikimedia.org/r/462326

Seb35 added a comment.Oct 1 2018, 10:05 AM

I’ve created some statistics with this proposed method. Summary:

  • generation time: 5x quicker than CDB, 2x quicker than current PHP arrays
  • size: 10% smaller than CDB, 30% smaller than current PHP arrays

Stats accross the 440 languages (414 cache files):

MethodCDBcurrent PHP arraysproposed PHP arrays
Time: mean (s)0.78490.29700.1456
Time: variance (s^2)1.585E-31.641E-49.381E-5
Time: stddev (s)0.03980.01280.0096
Size: total (bytes)472M611M426M
Size: mean (bytes)1.14M1.48M1.03M

See P7605 for methodological details.

Krinkle renamed this task from Experiment with plain .php files for l10n cache instead of CDB to Use static php array files for l10n cache instead of CDB.Dec 20 2018, 9:51 PM
Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.

Change 462326 merged by jenkins-bot:
[mediawiki/core@master] localisation: Make PHP cache files slimmer

https://gerrit.wikimedia.org/r/462326

Deployment plan, as recycled from 2015:

  • Enable array format on testwiki on Beta Cluster. (We used test2wiki in 2015, but I recommend we use testwiki this time which Beta has one of as well – Similar to 7237148affb4 / https://gerrit.wikimedia.org/r/217702).
  • Have Scap include testwiki as extra element in its loop over "pick a wiki per wiki version" for its invoking of _call_rebuildLocalisationCache. See also T105683, and https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/scap/+/224520/5/scap/tasks.py for related changes that might be needed.
  • Confirm it works in beta.
  • Enable array format for all wikis in Beta, except testwiki (inverse of before), and run that for a week or two to confirm there's no major issues with HHVM.
  • Enable array format for testwiki in production.
  • (Temporary) Add magic switch to wmf-config to use array on any wiki, if and only if the array cache file is locally found. This is for during the transition. See also c9ff83c234f53 / https://gerrit.wikimedia.org/r/224562.
  • Enable array for all wikis.
  • Remove temporary switch.
  • Remove extra element from Scap loop for rebuildLocalisationCache.

Change 508724 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] [BETA] Enable array format on testwiki

https://gerrit.wikimedia.org/r/508724

Change 508726 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Set wgLocalisationCacheConf['storeClass'] explicitly

https://gerrit.wikimedia.org/r/508726

Change 508726 merged by jenkins-bot:
[operations/mediawiki-config@master] Set wgLocalisationCacheConf['storeClass'] explicitly

https://gerrit.wikimedia.org/r/508726

Mentioned in SAL (#wikimedia-operations) [2019-05-10T14:54:23Z] <krinkle@deploy1001> Synchronized wmf-config/CommonSettings.php: T99740 / d9dbecad9c7b (duration: 00m 51s)

Krinkle updated the task description. (Show Details)Jul 13 2019, 1:38 AM

Change 528903 had a related patch set uploaded (by Krinkle; owner: Ladsgroup):
[mediawiki/core@master] localisation: Add process cache to LCStoreDB

https://gerrit.wikimedia.org/r/528903