Page MenuHomePhabricator

Make sure gridengine-exec starts on boot
Closed, ResolvedPublic

Description

gridengine-exec does not always start after a reboot. This means queues stay unavailable until someone does this manually.

more info @ {T109412: Enable tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs and build more webgrid hosts}
and
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150817-ToolLabs-WebgridOutage

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added a subscriber: valhallasw.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I guess we should put in a service {} stanza.

Sounds like a good way to make sure it stays online, but I'm not sure if we should rely on just puppet for starting services (after all, it might take ~20 mins). On the other hand, we have good puppet monitoring (and no init.d/upstart monitoring), so we might as well rely on it.

Change 233087 had a related patch set uploaded (by Tim Landscheidt):
gridengine: Ensure that service gridengine-exec is running

https://gerrit.wikimedia.org/r/233087

Change 233087 merged by Yuvipanda:
gridengine: Ensure that service gridengine-exec is running

https://gerrit.wikimedia.org/r/233087