Page MenuHomePhabricator
Paste P10365

-operations IRC log during deploy of 570892
ActivePublic

Authored by Tarrow on Feb 10 2020, 9:44 AM.
Tags
None
Referenced Files
F31606678: raw.txt
Feb 10 2020, 9:44 AM
Subscribers
None
(CR) Hoo man: [C: +2] Wikibase Client: Fix setting name typo [mediawiki-config] - https://gerrit.wikimedia.org/r/570892 (https://phabricator.wikimedia.org/T244529) (owner: Hoo man)
2:36 PM (Merged) jenkins-bot: Wikibase Client: Fix setting name typo [mediawiki-config] - https://gerrit.wikimedia.org/r/570892 (https://phabricator.wikimedia.org/T244529) (owner: Hoo man)
2:38 PM <•logmsgbot> !log hoo@deploy1001 Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 20s)
2:38 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (CR) Jhedden: [C: +2] icinga: update sms contact for jhedden [puppet] - https://gerrit.wikimedia.org/r/570896 (owner: Jhedden)
2:38 PM <•stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2:38 PM T244529: mw.wikibase.getLabelByLang not return item label for some items - https://phabricator.wikimedia.org/T244529
2:40 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (PS1) Elukey: Add presto_clusters_secrets in common.yaml [labs/private] - https://gerrit.wikimedia.org/r/570900
2:40 PM <•logmsgbot> !log hoo@deploy1001 Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
2:40 PM <•stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2:40 PM <•icinga-wm> IRC echo bot PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2:40 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (CR) Elukey: [V: +2 C: +2] Add presto_clusters_secrets in common.yaml [labs/private] - https://gerrit.wikimedia.org/r/570900 (owner: Elukey)
2:40 PM <•icinga-wm> IRC echo bot PROBLEM - PHP7 rendering on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - PHP7 rendering on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Apache HTTP on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - PHP7 rendering on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - PHP7 rendering on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Nginx local proxy to apache on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - PHP7 rendering on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:40 PM PROBLEM - Nginx local proxy to apache on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:40 PM PROBLEM - Nginx local proxy to apache on mw1316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - PHP7 rendering on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM <hoo> How do I force the deploy
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM <hoo> it's a revert
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - PHP7 rendering on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM <hauskatze> devnull somebody unplugged the wrong cable
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1254 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - PHP7 rendering on mw1255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:41 PM PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM <hoo> not sure why but my last change broke it
2:41 PM <•icinga-wm> IRC echo bot PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Nginx local proxy to apache on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM PROBLEM - Apache HTTP on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
2:41 PM ⇐ •icinga-wm quit (~icinga-wm@wikimedia/bot/icinga-wm) Excess Flood
2:41 PM <hoo> I guess
2:41 PM <Cohaf> sorry if wrong channel, I can't visit all wikis now
2:41 PM <elukey> Cohaf: yep it is, we are working on it :)
2:42 PM <hauskatze> devnull Cohaf: that's what happens when the Apache server crashes :)
2:42 PM <hoo> Got it, reverting with --force now
2:42 PM <Cohaf> thanks, I had 502 all round
2:42 PM <_joe_> Giuseppe Lavagetto hoo: damnit yes
2:42 PM <elukey> hoo: there was an occurrence of the same problem before, it is probably not your change
2:42 PM <Cohaf> I recalled then I was able to access
2:42 PM from Singapore
2:42 PM <elukey> but let's revert in any case
2:43 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (CR) Jcrespo: [C: +1] "> Patch Set 2:" [puppet] - https://gerrit.wikimedia.org/r/570792 (https://phabricator.wikimedia.org/T240094) (owner: Marostegui)
2:43 PM <Praxidicae> Adrestia #rip
2:43 PM <Amir1> Amir Sarabadani !log ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop (T244578)
2:43 PM <godog> Filippo Giunchedi 0x99D49B6B00CAD1E5 hoo: how's the revert ?
2:43 PM <•stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2:43 PM T244578: Tracking task: 2020-02-07 MW API server outage(s) - https://phabricator.wikimedia.org/T244578
2:43 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (PS1) Hoo man: Revert "Wikibase Client: Fix setting name typo" [mediawiki-config] - https://gerrit.wikimedia.org/r/570901
2:43 PM <•logmsgbot> !log hoo@deploy1001 Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 40s)
2:43 PM <•stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2:43 PM T244529: mw.wikibase.getLabelByLang not return item label for some items - https://phabricator.wikimedia.org/T244529
2:43 PM <hoo> godog: Done
2:44 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (CR) Hoo man: [C: +2] "For consistency" [mediawiki-config] - https://gerrit.wikimedia.org/r/570901 (owner: Hoo man)
2:44 PM <_joe_> Giuseppe Lavagetto we're back
2:44 PM <godog> Filippo Giunchedi 0x99D49B6B00CAD1E5 hoo: thank you
2:44 PM <Cohaf> thanks
2:44 PM <hoo> Seems that typo actually hid a very nasty bug :S
2:44 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (Merged) jenkins-bot: Revert "Wikibase Client: Fix setting name typo" [mediawiki-config] - https://gerrit.wikimedia.org/r/570901 (owner: Hoo man)
2:45 PM <_joe_> Giuseppe Lavagetto also it's friday :)
2:45 PM → AmandaNP joined (uid1203@wikipedia/DeltaQuad)
2:45 PM <hoo> Yes, that calls for bad luck :S
2:45 PM → icinga-wm joined (~icinga-wm@wikimedia/bot/icinga-wm)
2:45 PM <icinga-wm> IRC echo bot RECOVERY - Nginx local proxy to apache on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.714 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - phpfpm_up reduced availability on icinga1001 is OK: (C)0.8 le (W)0.9 le 0.9534 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
2:45 PM •icinga-wm was voiced (+v) by •ChanServ
2:45 PM <•icinga-wm> IRC echo bot RECOVERY - Apache HTTP on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.679 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Apache HTTP on mw1272 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - PHP7 rendering on mw1319 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.139 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:45 PM RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase2023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:45 PM RECOVERY - Apache HTTP on mw1274 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.315 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Nginx local proxy to apache on mw1271 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.412 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Nginx local proxy to apache on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.132 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.232 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:45 PM RECOVERY - High average POST latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POST
2:45 PM RECOVERY - PHP7 rendering on mw1242 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.182 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:45 PM RECOVERY - PHP7 rendering on mw1238 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.191 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - PHP7 rendering on mw1261 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.264 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1275 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.613 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Nginx local proxy to apache on mw1328 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.175 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PHP7 rendering on mw1272 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.182 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.572 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204,205} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-met
2:46 PM RECOVERY - PHP7 rendering on mw1327 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.143 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - PHP7 rendering on mw1332 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.190 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 71.45 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
2:46 PM RECOVERY - Apache HTTP on mw1326 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Apache HTTP on mw1256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - Nginx local proxy to apache on mw1258 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PHP7 rendering on mw1269 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM RECOVERY - Nginx local proxy to apache on mw1263 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.716 second response time https://wikitech.wikimedia.org/wiki/Application_servers
2:46 PM RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2:46 PM RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2:46 PM RECOVERY - PHP7 rendering on mw1330 is OK: HTTP OK: HTTP/1.1 200 OK - 79818 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
2:46 PM ⇐ hauskatze and kevinbazira quit
2:47 PM <•icinga-wm> IRC echo bot RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.6208 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
2:48 PM <•wikibugs> Wikibugs v2.1, https://tools.wmflabs.org/wikibugs/ (PS3) Muehlenhoff: Switch logstash hosts to standard Partman recipe [puppet] - https://gerrit.wikimedia.org/r/570600 (https://phabricator.wikimedia.org/T156955)
2:48 PM <•icinga-wm> IRC echo bot RECOVERY - ATS TLS has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
2:48 PM RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
2:48 PM RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
2:48 PM RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
2:48 PM RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:49 PM PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
2:49 PM RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2:49 PM RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
2:49 PM RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
2:49 PM RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:49 PM RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
2:49 PM RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:49 PM RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
2:50 PM RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
2:50 PM RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:50 PM RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2:51 PM PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
2:52 PM RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX