Page MenuHomePhabricator

cp3050 depooled due to explosion in CPU usage and inuse sockets
Closed, ResolvedPublic

Event Timeline

I've added a Grafana annotation with various tags for alignment in dashboards.

The partial esams outage lasted about 22 minutes.

Screenshot 2019-12-18 at 00.54.41.png (992×1 px, 94 KB)

ema triaged this task as Medium priority.Dec 18 2019, 9:43 AM

The host is the only one in esams that was running with the xdebug plugin enabled in order to debug the following ttfb regression reported by the Performance-Team: T238494.

Suspecting that it might be the cause of this crash, @Vgutierrez and I disabled the plugin, restarted ats-be at 08:59 and repooled the host at 09:18.

Change 559457 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "ATS: enable xdebug plugin on 3 hosts"

Change 559457 merged by Ema:
[operations/puppet@production] Revert "ATS: enable xdebug plugin on 3 hosts"

Mentioned in SAL (#wikimedia-operations) [2019-12-19T14:41:10Z] <ema> cp1075, cp4028: ats-backend-restart to disable xdebug plugin T241001

text@esams has had no similar issues since we disabled xdebug in December, closing.