Due to issues on system lead, hardware request task T147596 was created.
Lead experienced hardware issues, as documented on https://etherpad.wikimedia.org/p/gerrit-outage-20161006 The contents of that pad have been copied below:
Responders: Chad, Alex, Brandon, Daniel UTC on Oct 5 23:06 < yurik> gerrit is really unhappy today :( 23:22 <+ greg-g> yurik: first I heard, can you say more? otherwise I'll just have to ignore the comment 23:52 < Krenair> seems fine to me yurik 23:52 < yurik> greg-g, Krenair, sorry, just saw your replies. For some reason it took ~4-5 min for "git review" to go throuw 23:52 < yurik> though 23:53 < yurik> might have been just my connection, but IRC and other sites seem to be okayish 23:54 < yurik> actually just checked - git pull takes considerable time, even though gerrit.wikimedia.org opens pretty fast * Starting at 17:49UTC on 6 Oct gerrit started becoming unresponsive. CPU usage was through the roof ** Puppet halted, default error page for Gerrit being shown * Other symptoms: ** Apache using far too much CPU ** IO appears fine ** Too much sys cpu? ** Network traffic seems normal * Restarting gerrit & apache had little effect ** Gerrit suffering from unacceptably slow startup times (logging module?) * Rebooted server once, did not help ** No hardware errors appeared on reboot * Rebooting a second time with older kernel ** Also did not seem to help [22:59:10] <bblack> the cpu cores are all running at like 200mhz right now [23:00:12] <bblack> root@lead:~# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq [23:00:14] <bblack> 185253 [23:00:22] <bblack> yeah they're all running at ~200Mhz
- end etherpad --