Page MenuHomePhabricator

Fix the init script for gridmaster in sonofgridengine
Closed, ResolvedPublic

Description

Currently, the package in Stretch for the grid-master creates a service via a somewhat broken sysV init script. After package install this needs to be stripped out and replaced with a systemd unit like I added for the shadow master in the module.

The biggest thing is making sure the old script is gone from all runlevels and then the new one uses the correct pidfile (since these are forking systemd units) of /var/spool/gridengine/qmaster/qmaster.pid

Currently, it does work as I've got it set up. It just doesn't reliably shut down or restart the service every time. Killall is a better way to stop the sge_qmaster process on the master until this is done. Note: I removed the disabling of status checks because that actually caused systemd to break the grid once in a while by starting the service as root--not exactly sure why, but the current puppet keeps it up at least.

Event Timeline

Bstorm renamed this task from Fix the terrible init script for gridmaster in sonofgridengine to Fix the init script for gridmaster in sonofgridengine.Dec 3 2018, 8:39 PM
Bstorm triaged this task as Normal priority.
Bstorm created this task.
aborrero claimed this task.Dec 4 2018, 10:37 AM

I can handle this if you want :-)

Change 477531 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: grid: base: extract variables into hiera keys

https://gerrit.wikimedia.org/r/477531

Change 477554 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: grid: introduce systemd service file for sge_qmaster

https://gerrit.wikimedia.org/r/477554

Change 477531 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: grid: base: extract variables into hiera keys

https://gerrit.wikimedia.org/r/477531

Change 477554 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: grid: introduce systemd service file for sge_qmaster

https://gerrit.wikimedia.org/r/477554

From the systemd point of view, the daemon seems to be working just fine. I think we can close the task and reopen if we see anything else.

root@toolsbeta-sgegrid-master:~# systemctl status gridengine-master
● gridengine-master.service - SGE Master daemon
   Loaded: loaded (/lib/systemd/system/gridengine-master.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2018-12-04 14:13:53 UTC; 21h ago
 Main PID: 20871 (sge_qmaster)
    Tasks: 13 (limit: 4915)
   CGroup: /system.slice/gridengine-master.service
           └─20871 /usr/lib/gridengine/sge_qmaster

Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: Q:4, AQ:4 J:0(0), H:4(4), C:53, A:2, D:1, P:0, CKPT:1, US:1, PR:0, RQS:0, AR:0, S:nd:0/lf:0
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: --------------STOP-SCHEDULER-RUN-------------
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: Q:4, AQ:4 J:0(0), H:4(4), C:53, A:2, D:1, P:0, CKPT:1, US:1, PR:0, RQS:0, AR:0, S:nd:0/lf:0
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: --------------STOP-SCHEDULER-RUN-------------
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: Q:4, AQ:4 J:0(0), H:4(4), C:53, A:2, D:1, P:0, CKPT:1, US:1, PR:0, RQS:0, AR:0, S:nd:0/lf:0
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: --------------STOP-SCHEDULER-RUN-------------
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: Q:4, AQ:4 J:0(0), H:4(4), C:53, A:2, D:1, P:0, CKPT:1, US:1, PR:0, RQS:0, AR:0, S:nd:0/lf:0
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: --------------STOP-SCHEDULER-RUN-------------
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: Q:4, AQ:4 J:0(0), H:4(4), C:53, A:2, D:1, P:0, CKPT:1, US:1, PR:0, RQS:0, AR:0, S:nd:0/lf:0
Dec 05 11:22:55 toolsbeta-sgegrid-master sge_qmaster[20871]: --------------STOP-SCHEDULER-RUN-------------

root@toolsbeta-sgegrid-master:~# systemctl cat gridengine-master
# /lib/systemd/system/gridengine-master.service
# managed by puppet!

[Unit]
Description=SGE Master daemon
Before=multi-user.target
Before=graphical.target
After=remote-fs.target

[Service]
Restart=always
# Don't play PID guessing games
GuessMainPID=no
Type=simple
# don't fork, let systemd be the direct parent of the proc
Environment="SGE_ND=true"
# We could just place all env vars here...
EnvironmentFile=/etc/default/gridengine
ExecStart=/usr/sbin/sge_qmaster
# some security measures
ProtectSystem=full
ProtectHome=true
PrivateTmp=true
ProtectControlGroups=true
aborrero closed this task as Resolved.Dec 5 2018, 11:40 AM
Bstorm reopened this task as Open.EditedDec 5 2018, 3:33 PM
Bstorm claimed this task.

Awesome!
I want to remove the symlinks from rc* so it doesn't cause problems on "runlevel changes". I'll just add a couple things here instead of creating a new one.

Bstorm lowered the priority of this task from Normal to Low.Dec 5 2018, 3:33 PM

Change 477833 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove the rc links for the old sysV init as well

https://gerrit.wikimedia.org/r/477833

Change 477833 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove the rc links for the old sysV init as well

https://gerrit.wikimedia.org/r/477833

Change 477909 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: also make the service enable-able

https://gerrit.wikimedia.org/r/477909

Change 477909 merged by Bstorm:
[operations/puppet@production] sonofgridengine: also make the service enable-able

https://gerrit.wikimedia.org/r/477909

Change 477919 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: make shadowd enable-able as well

https://gerrit.wikimedia.org/r/477919

Change 477919 merged by Bstorm:
[operations/puppet@production] sonofgridengine: make shadowd enable-able as well

https://gerrit.wikimedia.org/r/477919

Bstorm closed this task as Resolved.Dec 5 2018, 11:00 PM

Looks good now!

Change 498973 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: SGE_DEBUG_LEVEL required for sge shadow to run

https://gerrit.wikimedia.org/r/498973

Change 498973 merged by Bstorm:
[operations/puppet@production] sonofgridengine: SGE_DEBUG_LEVEL required for sge shadow to run

https://gerrit.wikimedia.org/r/498973

GTirloni removed a subscriber: GTirloni.Mar 25 2019, 7:31 PM