I've scapped twice today, and both times the mwdebug1001 and mwdebug1002 hosts are the last two hosts on the scap-cdb-rebuild step. I don't think this is expected, and I'm assuming that something must not be right with those hosts for them to consistently be the slowest?
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | None | T203625 mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild | |||
| Resolved | akosiaris | T212955 Increase mwdebugXXXX hosts CPU |
Event Timeline
Compared to the rest, mwdebug* are VMs, how large is the difference to the other servers you were seeing?
IIRC it was about 3-4 minutes. But that was also with HHVM, it's probably worth checking again with PHP 7 to see if the performance difference is noticeable.
random question from from this task and T203664: should the debug hosts also be the same hardware/setup as the main cluster?
Yeah, that seems sensible unless there's some significant reason (e.g. hardware cost) not to.
Well that's quite a significant reason, and that's why we're not sinking ~ 16k USD
The debug hosts sit idle all the time. Having 4 physical servers sitting idle in the datacenters is a huge waste of resources (and rack space). We can beef up the VMs a bit, esp in terms of CPU if they're slowing the deploys significantly, but I see no other good reason to waste resources on these.
The hosts mwdebug1001, mwdebug1002, mwdebug2001, mwdebug2002 now have four vCPUs allocated (was T212955). Should make the cdb rebuild roughly four times faster which is hopefully enough.
Note that the rebuildLocalisationCache.php script has to be invoked with Zend php. HHVM on our setup ties all forked process to the same CPU.
I've scapped once or twice recently and didn't notice it being egregiously slow. We can call it resolved.