See the parent task (T203625: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild ) for reasoning. tl;dr: these hosts are reliably the last hosts to finish the scap-cdb-rebuild step (cpu and/or memory intensive, I haven't profiled it), which in turn is causing timeouts during our health checks.
|Open||None||T203625 mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild|
|Duplicate||None||T203664 scap timeout checking index.php/api.php mwdebug1001 / mwdebug1002|
|Open||None||T215368 First request after a MediaWiki sync times out on mwdebug|
|Resolved||akosiaris||T212955 Increase mwdebugXXXX hosts CPU|
- Mentioned In
- T203625: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild
T215368: First request after a MediaWiki sync times out on mwdebug
- Mentioned Here
- T191921: mwscript rebuildLocalisationCache.php takes 40 minutes on HHVM (rather than ~5 on PHP 5)
T203625: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild
IIRC the cdb files are generated by rebuildLocalisationCache.php which is CPU bounded and runs with up to 30 parallel tasks.
I previously found it was bound to a single CPU under HHVM due to hhvm.stats.enable_hot_profiler being enabled and enforcing CPU affinity (hence all 30 threads run on the same CPU). T191921#4557854 We then moved that to php7.0 since we want to migrate out of HHVM anyway.
To speed up the cdb rebuild, we thus need more cores available for mwdebug hosts. Currently /proc/cpuinfo reports a single core. Exact number of core to be determined, I guess we can use at least 4 ?
I 've removed the memory part cause https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=mwdebug1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&from=now-7d&to=now shows that mwdebug1002 is never pressed for more memory. I 've also bumped vpu count to 4. I 'll resolve this for now, if we need more resources feel free to reopen.