Page MenuHomePhabricator

Statistics for proposed LocalisationCache with slimer PHP arrays

Authored By
Seb35
Oct 1 2018, 9:52 AM
Size
2 KB
Referenced Files
None
Subscribers
None

Statistics for proposed LocalisationCache with slimer PHP arrays

This is an annex to T99740 and Iaa5e32830dc1bb710b9e0f1a681afe91e521ece9.
Here are some statistics for the three methods (CDB, current PHP arrays, proposed PHP arrays). Summary:
* generation time: 5x quicker than CDB, 2x quicker than current PHP arrays
* size: 10% smaller than CDB, 30% smaller than current PHP arrays
Stats accross the 440 languages (414 cache files): CDB / current PHP arrays / proposed PHP arrays
* time: mean: 0.78493354060433 / 0.29702293276787 / 0.14556954828176
* time: variance: 0.001585028865205 / 0.00016412325920965 / 9.3818872110887E-5
* time: stddev: 0.039812420991508 / 0.012811060034581 / 0.0096860142530809
* size: total: 472M / 611M / 426M
* size: mean: 1.14M / 1.48M / 1.03M
Some methodological precisions:
* MW version: CDB and current PHP arrays: 757b54b, proposed PHP arrays: b481d81
* time in seconds, computed with 5 consecutive measures for each language
* computed on my laptop (Debian) on an ext4 filesystem
* tested code (in maintenance/eval.php): $lc = new LocalisationCache( $conf ); $lc->getItem( $code, 'namespaceNames' ); for each code in MediaWiki\Languages\Data\Names::$names
* variance is unbiaised sample variance (factor N-1 but not N)
* stddev is sqrt( unbiaised sample variance ), hence it is biaised
* size computed with `du -hs`
* LocalisationCache class is recreated each time else computed language is kept in memory
Code to be executed with `php maintenance/eval.php`
```
$names = MediaWiki\Languages\Data\Names::$names;
$conf = [ 'store' => 'array', 'storeDirectory' => '/tmp/files' ];
$internal_N = 5;
$times = [];
foreach( $names as $code => $name ) {
$time = microtime(true);
for( $i=0; $i<$internal_N; $i++ ) {
$lc = new LocalisationCache( $conf );
$lc->getItem( $code, 'namespaceNames' );
array_map('unlink', glob("/tmp/files/*.php"));
}
$time = microtime(true) - $time;
$times[$code] = $time / $internal_N;
echo "$code - {$times[$code]} - $name\n";
}
$n = count( $times );
$mean = array_sum($times) / $n;
var_dump( $mean ); # Mean
$carry = 0.0;
foreach( $times as $time ) { $d = $time - $mean; $carry += $d * $d; }
var_dump( $carry / ($n-1) ); # Unbiaised sample variance
var_dump( sqrt( $carry / ($n-1) ) ); # (Biased) sample standard deviation
```

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
6489421
Default Alt Text
Statistics for proposed LocalisationCache with slimer PHP arrays (2 KB)

Event Timeline