2:40 PM <•stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2:40 PM <•icinga-wm> IRC echo bot PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2:40 PM <•icinga-wm> IRC echo bot PROBLEM - PHP7 rendering on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - PHP7 rendering on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Apache HTTP on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - PHP7 rendering on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - PHP7 rendering on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Nginx local proxy to apache on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - PHP7 rendering on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Nginx local proxy to apache on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - Nginx local proxy to apache on mw1316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - PHP7 rendering on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM <hoo> How do I force the deploy
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM <hoo> it's a revert
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - PHP7 rendering on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM <hauskatze> devnull somebody unplugged the wrong cable
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1254 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - PHP7 rendering on mw1255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM <hoo> not sure why but my last change broke it
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM <_joe_> Giuseppe Lavagetto also it's friday :)
2:45 PM → AmandaNP joined (uid1203@wikipedia/DeltaQuad)
2:45 PM <hoo> Yes, that calls for bad luck :S
2:45 PM → icinga-wm joined (~icinga-wm@wikimedia/bot/icinga-wm)
2:45 PM <icinga-wm> IRC echo bot RECOVERY - Nginx local proxy to apache on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.714 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - phpfpm_up reduced availability on icinga1001 is OK: (C)0.8 le (W)0.9 le 0.9534 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
2:45 PM •icinga-wm was voiced (+v) by •ChanServ
2:45 PM <•icinga-wm> IRC echo bot RECOVERY - Apache HTTP on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.679 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Apache HTTP on mw1272 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - PHP7 rendering on mw1319 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.139 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:45 PM RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase2023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - Apache HTTP on mw1274 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.315 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Nginx local proxy to apache on mw1271 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.412 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Nginx local proxy to apache on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.132 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.232 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - High average POST latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POST
2:45 PM RECOVERY - PHP7 rendering on mw1242 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.182 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:45 PM RECOVERY - PHP7 rendering on mw1238 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.191 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - PHP7 rendering on mw1261 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.264 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1275 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.613 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Nginx local proxy to apache on mw1328 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.175 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PHP7 rendering on mw1272 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.182 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.572 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204,205} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-met
2:46 PM RECOVERY - PHP7 rendering on mw1327 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.143 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - PHP7 rendering on mw1332 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.190 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 71.45 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
2:46 PM RECOVERY - Apache HTTP on mw1326 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Apache HTTP on mw1256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Nginx local proxy to apache on mw1258 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PHP7 rendering on mw1269 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1263 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.716 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2:46 PM RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2:46 PM RECOVERY - PHP7 rendering on mw1330 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM ⇐ hauskatze and kevinbazira quit
2:47 PM <•icinga-wm> IRC echo bot RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.6208 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
2:48 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (PS3) Muehlenhoff: Switch logstash hosts to standard Partman recipe [puppet] - https://gerrit.wikimedia.org/r/570600 (https://phabricator.wikimedia.org/T156955)
2:48 PM <•icinga-wm> IRC echo bot RECOVERY - ATS TLS has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
2:48 PM RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
2:48 PM RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
2:48 PM RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
2:48 PM RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:49 PM PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
2:49 PM RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2:49 PM RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
2:49 PM RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
2:49 PM RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:49 PM RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:49 PM RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:50 PM RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
2:50 PM RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:50 PM RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:51 PM PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
2:52 PM RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX