Page MenuHomePhabricator

Restart HHVM on API appservers every about 48 hours
Closed, ResolvedPublic

Description

Given we have a permanent memory leak (see also T133674, T146451) that we have no way of investigating in a timely manner until the memory profiling in HHVM is fixed (I guess in 3.15, we can hope), that Facebook itself never lets HHVM run uninterrupted for very long amounts of time, we need to set up a sure way to restart periodically HHVM.

Before we can do this, though, I'd like to have a script that can safely depool / repool a node from a cluster, and that is tracked in T145518

Event Timeline

Joe removed Joe as the assignee of this task.Oct 11 2016, 3:27 PM
Joe added a project: User-Joe.

Change 315938 had a related patch set uploaded (by Giuseppe Lavagetto):
role::mediawiki::webserver: restart hhvm routinely

https://gerrit.wikimedia.org/r/315938

Change 315938 merged by Giuseppe Lavagetto:
role::mediawiki::webserver: restart hhvm routinely

https://gerrit.wikimedia.org/r/315938

Change 320361 had a related patch set uploaded (by Elukey):
Raise nagios retry_interval to avoid false alarms for HHVM restarts

https://gerrit.wikimedia.org/r/320361

Change 320361 merged by Elukey:
Raise nagios retry_interval to avoid false alarms for HHVM restarts

https://gerrit.wikimedia.org/r/320361