Page MenuHomePhabricator
Paste P7605

Statistics for proposed LocalisationCache with slimer PHP arrays
ActivePublic

Authored by Seb35 on Oct 1 2018, 9:52 AM.
This is an annex to T99740 and Iaa5e32830dc1bb710b9e0f1a681afe91e521ece9.
Here are some statistics for the three methods (CDB, current PHP arrays, proposed PHP arrays). Summary:
* generation time: 5x quicker than CDB, 2x quicker than current PHP arrays
* size: 10% smaller than CDB, 30% smaller than current PHP arrays
Stats accross the 440 languages (414 cache files): CDB / current PHP arrays / proposed PHP arrays
* time: mean: 0.78493354060433 / 0.29702293276787 / 0.14556954828176
* time: variance: 0.001585028865205 / 0.00016412325920965 / 9.3818872110887E-5
* time: stddev: 0.039812420991508 / 0.012811060034581 / 0.0096860142530809
* size: total: 472M / 611M / 426M
* size: mean: 1.14M / 1.48M / 1.03M
Some methodological precisions:
* MW version: CDB and current PHP arrays: 757b54b, proposed PHP arrays: b481d81
* time in seconds, computed with 5 consecutive measures for each language
* computed on my laptop (Debian) on an ext4 filesystem
* tested code (in maintenance/eval.php): $lc = new LocalisationCache( $conf ); $lc->getItem( $code, 'namespaceNames' ); for each code in MediaWiki\Languages\Data\Names::$names
* variance is unbiaised sample variance (factor N-1 but not N)
* stddev is sqrt( unbiaised sample variance ), hence it is biaised
* size computed with `du -hs`
* LocalisationCache class is recreated each time else computed language is kept in memory
Code to be executed with `php maintenance/eval.php`
```
$names = MediaWiki\Languages\Data\Names::$names;
$conf = [ 'store' => 'array', 'storeDirectory' => '/tmp/files' ];
$internal_N = 5;
$times = [];
foreach( $names as $code => $name ) {
$time = microtime(true);
for( $i=0; $i<$internal_N; $i++ ) {
$lc = new LocalisationCache( $conf );
$lc->getItem( $code, 'namespaceNames' );
array_map('unlink', glob("/tmp/files/*.php"));
}
$time = microtime(true) - $time;
$times[$code] = $time / $internal_N;
echo "$code - {$times[$code]} - $name\n";
}
$n = count( $times );
$mean = array_sum($times) / $n;
var_dump( $mean ); # Mean
$carry = 0.0;
foreach( $times as $time ) { $d = $time - $mean; $carry += $d * $d; }
var_dump( $carry / ($n-1) ); # Unbiaised sample variance
var_dump( sqrt( $carry / ($n-1) ) ); # (Biased) sample standard deviation
```