Page MenuHomePhabricator

Investigate major HTTP 500 spike since 2016-09-23
Closed, ResolvedPublic

Description

https://grafana.wikimedia.org/dashboard/file/varnish-http-errors.json?from=1470009600000&to=1479463200000


https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&oldid=993285#2016-09-22 cherry-picked entries from around this date:

2016-09-22
07:33 elukey: rebooting stat1004 for kernel upgrades
07:40 moritzm: rolling restart of trusty swift frontend servers in codfw for kernel security update
07:52 elukey: rebooted stat100[23] for kernel upgrades
07:58 elukey: uploaded varnishkafka 1.0.12-1 to reprepro
08:35 elukey: restarted varnishkafka on cp1099 (log abandoned )
08:40 elukey: installed varnishkafka 1.0.12 on cp1099
08:43 elukey: installing varnishkafka 1.0.12 on cache:upload esams
09:02 elukey: installing varnishkafka 1.0.12 on cache:upload codfw
12:25 elukey: installing varnishkafka 1.0.12 on cache:upload ulsfo and eqiad
15:02 bblack: upgrading openssl on cp*
18:38 logmsgbot: aaron@tin Synchronized php-1.28.0-wmf.20/includes/libs/rdbms/database/Database.php: rMW844cfd568a7c & rMW014a420b4525 (duration: 00m 49s)
18:47 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.20/extensions/CentralNotice: SWAT: Update extensions/CentralNotice submodule (T144952) (duration: 00m 52s)
19:09 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.20
20:08 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.20
22:49 logmsgbot: aaron@tin Synchronized php-1.28.0-wmf.20/includes/libs/rdbms/loadbalancer/LoadBalancer.php: rMWa73a7ef92862 (duration: 01m 04s)

HTTP 5xx matches:

Total request count did not significantly change, so it's not probably not caused by overall traffic being higher, but rather something on our side.

Event Timeline

Pretty sure what you're looking at here is T147648 (also related: T147784)

Krinkle claimed this task.

Looks like that was it. It's coming back down now:

Might take a while to return fully as it depends on the iOS app update being rolled out to users (and users may not want to update right away). Closing this task in favour of the others.