While investigating the strange behaviour of `Babel::getCachedUserLanguageInfo()` as a followup for an incident, in which to slow responses from the backend we ended up having a much higher rate of cache misses, I went to dig deeper in what was wrong with the code path in question.
From reading extensively the documentation and a couple conversations with @Krinkle I was convinced that `WanObjectCache::getWithSetCallback` as used in the babel extension (see https://gerrit.wikimedia.org/g/mediawiki/extensions/Babel/+/4606c64ee5f42ee841ffb4882fbc52df808da30a/includes/Babel.php#323) should cache the result as an empty array in case no result is returned within 10 seconds (the timeout). Data from past outages suggested the contrary.
So to test this hypothesis in our production setting I ran a small http server written in golang that would listen on port 9099 and respond "ok" after 15 seconds. If curious, you can see the code at P10386.
I wrote the follwing code, to be run with shell.php:
```lang=php
use MediaWiki\MediaWikiServices;
$cache = MediaWikiServices::getInstance()->getMainWANObjectCache();
$key = "joe::test";
// Delete any leftover key, and do not produce a tombstone.
$cache->delete($key, 0);
// Now let's simulate a situation where we have a remote responding only after 15 seconds.
function get_set( $cache, $key) {
$curtime = time();
$val = $cache->getWithSetCallback(
$key,
100,
function ( $oldValue, &$ttl, array &$setOpts ) use ( $curtime ) {
$req = MWHttpRequest::factory( 'http://localhost:9099', [ 'timeout' => 10], __METHOD__ );
$status = Status::wrap( $req->execute() );
if ( !$status->isOK() ) {
echo "Timeout! $curtime\n";
return $curtime;
} else {
return 0;
}
},
['checkKeys' => [ $key ]]
);
if ( $val === 0 ) {
echo "$curtime - No timeout\n";
} elseif ($val == $curtime) {
echo "$curtime - cache miss\n";
} else {
echo "$curtime - cache hit\n";
}
}
while (true) {
get_set($cache, $key);
sleep(1);
}
```
the result I expected was to get a first cache miss, then for 100 seconds to get cache hits.
To my surprise, the result was very different:
```
mwmaint1002:~$ mwscript shell.php --wiki enwiki
Psy Shell v0.9.12 (PHP 7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1 — cli) by Justin Hileman
[CUT]
Timeout! 1581435967
1581435967 - cache miss
Timeout! 1581435978
1581435978 - cache miss
Timeout! 1581435989
1581435989 - cache miss
Timeout! 1581436000
1581436000 - cache miss
Timeout! 1581436011
1581436011 - cache miss
Timeout! 1581436022
1581436022 - cache miss
Timeout! 1581436033
1581436033 - cache miss
Timeout! 1581436044
1581436044 - cache miss
Timeout! 1581436055
1581436055 - cache miss
Timeout! 1581436066
1581436066 - cache miss
Timeout! 1581436077
1581436077 - cache miss
...
```
To add more mistery, if I reduce the timeout to 2 seconds, the results are quite different:
```
mwmaint1002:~$ mwscript shell.php --wiki enwiki
Psy Shell v0.9.12 (PHP 7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1 — cli) by Justin Hileman
[CUT]
Timeout! 1581436203
1581436203 - cache miss
Timeout! 1581436206
1581436206 - cache miss
Timeout! 1581436209
1581436209 - cache miss
Timeout! 1581436212
1581436212 - cache miss
1581436215 - cache hit
1581436216 - cache hit
1581436217 - cache hit
1581436218 - cache hit
1581436219 - cache hit
1581436220 - cache hit
1581436221 - cache hit
1581436222 - cache hit
1581436223 - cache hit
1581436224 - cache hit
1581436225 - cache hit
1581436226 - cache hit
1581436227 - cache hit
1581436228 - cache hit
1581436229 - cache hit
1581436230 - cache hit
1581436231 - cache hit
1581436232 - cache hit
1581436233 - cache hit
1581436234 - cache hit
1581436235 - cache hit
...
```
So it seems that the slower the fetching function is, the least probability its result has to be cached. This is the opposite of what I would consider intuitively important to do.
Please forgive me if I did some basic mistake, my PHP is rusty and my knowledge of MediaWiki is relatively scarce.