Time in UTC-5 (CST daylight savings):
[23:15:53] <icinga-wm> PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:19:27] <icinga-wm> PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:24:14] <halfak> ARG [23:24:16] <halfak> WHY [23:24:19] <icinga-wm> RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.628 second response time [23:24:58] <halfak> Hmm... looks like I can't even get the homepage to load [23:25:27] <halfak> https://ores.wmflabs.org/node/ores-web-05/ [23:25:29] <halfak> Is up [23:25:33] <halfak> but https://ores.wmflabs.org/node/ores-web-03/ is down [23:25:44] <halfak> I can ssh to ores-web-03 [23:26:07] <halfak> On ores-web-03, there's one big python process. [23:26:28] <halfak> Top line: 3338 www-data 20 0 4020768 3.123g 5600 S 7.3 80.8 1296:19 python [23:26:32] <halfak> 80% of memory! [23:26:49] <halfak> It hovers around 4-8% cpu [23:27:59] * Amir1 (uid102662@gateway/web/irccloud.com/x-duqukzbslvjcdzgc) has joined [23:27:59] * ChanServ gives voice to Amir1 [23:28:05] <halfak> o/ Amir1 [23:28:15] <Amir1> halfak: hey [23:28:26] <Amir1> it's morning here, why are you awake? :D [23:28:33] <halfak> Been looking into the icinga notification. [23:28:39] <halfak> Will get you a paste of my notes shortly. [23:28:49] <halfak> TL;DR: ores-web-03 got into a weird state [23:29:49] <halfak> Service restart did nothing. Executed without error. [23:30:11] <Amir1> okay We should look into that why our instances suddenly gets crazy [23:30:24] <halfak> This one is really weird. [23:30:38] <halfak> uwsgi seems to have died and been replaced with a python process. [23:30:43] <halfak> Usually the "command" is uwsgi [23:30:57] <halfak> https://ores.wmflabs.org/node/ores-web-03/ is back online [23:31:11] <icinga-wm> RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.860 second response time [23:31:36] <halfak> And we're back!