Page MenuHomePhabricator

The aphlict systemd unit needs to be rewritten from scratch
Closed, ResolvedPublic

Description

Due to an alarm this morning, I took a quick look at the aphlict systemd unit, and it's completely broken and needs to be rewritten.

Namely:

  • systemctl start aphlict.service starts the service, but since the program forks to call php, which then calls nodejs, there is no way for systemd to track which pid it should refer to and it thinks the service is dead.
  • The ExecReload command is wrong as well, as a consequence
  • Restarting the service doesn't work
  • This unit specifies KillMode=process and I'd like to know why, given I cannot see a good reason to leave stray processes if the main pid is killed with SIGKILL. What you want is to use mixed I guess.

Event Timeline

Joe created this task.Sep 21 2017, 7:49 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2017, 7:49 AM
Joe triaged this task as High priority.Sep 21 2017, 7:49 AM
This comment was removed by Paladox.
Dzahn added a comment.Sep 21 2017, 2:17 PM

What alert was it? I don't think there is any Icinga monitoring for it yet and it wasn't even expected to be used, like the service was added but Phab wasn't even configured yet to use it. The merge was just supposed to be "add the config so we can start working on making it available".

it does work, it seems. It is starting on port 22280 see /srv/phab/aphlict/config.json

root@phabricator:/home/paladox# telnet localhost 22280
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

also it does use a pid, see /srv/phab/aphlict/config.json.

Change 379560 had a related patch set (by Paladox) published:
[operations/puppet@production] Phabricator: Fix aphlict systemd script

https://gerrit.wikimedia.org/r/379560

Indeed, it shouldn't be enabled or alerting. Hmm.

i think systemd sent the alert, per recovery at 8:35am this morning

[08:35:48] <+icinga-wm> RECOVERY - Check systemd state on phab1001 is OK: OK - running: The system is fully operational

Is there a way to have a systemd unit installed but not auto-started/monitored/expected? That'd be ideal for aphlict since it's currently experimental.

@mmodell yep, i think it's because we are using base::service, so i guess lets just say for it to not run. though i am not sure how to do that yet.

This comment was removed by Paladox.

Change 379560 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Phabricator: Fix aphlict systemd script

https://gerrit.wikimedia.org/r/379560

Change 379560 merged by Giuseppe Lavagetto:
[operations/puppet@production] Phabricator: Fix aphlict systemd script

https://gerrit.wikimedia.org/r/379560

Joe added a comment.Sep 22 2017, 10:48 AM

Thanks to @Paladox work on this, the aphlict service unit now handles correctly the software.

I am going to make a couple more tweaks if I can drop the ubuntu compatibility, but I think we can close this ticket.

Joe closed this task as Resolved.Sep 22 2017, 10:49 AM
Joe assigned this task to Paladox.

@Joe thanks :)

Yeh we can remove Ubuntu / upstart support.

mmodell rescinded a token.
Dzahn awarded a token.Sep 22 2017, 5:18 PM