Page MenuHomePhabricator

SystemdUnitDownForLong labstore1007:9100 - ferm.service
Closed, ResolvedPublic

Description

Common information

  • alertname: SystemdUnitDownForLong
  • cluster: wmcs
  • instance: labstore1007:9100
  • job: node
  • prometheus: ops
  • severity: task
  • site: eqiad
  • source: prometheus
  • state: failed
  • team: wmcs

Firing alerts



Event Timeline

dcaro claimed this task.
dcaro added a subscriber: Andrew.

This actually was not related (I think).

Ferm failed to start because it failed to resolve a name:

root@labstore1007:~# systemctl status ferm.service
● ferm.service - ferm firewall configuration
   Loaded: loaded (/lib/systemd/system/ferm.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2022-09-12 18:09:48 UTC; 13h ago
  Process: 15607 ExecStart=/etc/init.d/ferm start (code=exited, status=255/EXCEPTION)
 Main PID: 15607 (code=exited, status=255/EXCEPTION)

Sep 12 18:09:48 labstore1007 ferm[15607]:         saddr
Sep 12 18:09:48 labstore1007 ferm[15607]:         (
Sep 12 18:09:48 labstore1007 ferm[15607]:             deferred=ARRAY(0x5619ff0744f0)
Sep 12 18:09:48 labstore1007 ferm[15607]:         )
Sep 12 18:09:48 labstore1007 ferm[15607]:         <--
Sep 12 18:09:48 labstore1007 ferm[15607]: DNS query for 'sagres.c3sl.ufpr.br' failed: SERVFAIL
Sep 12 18:09:48 labstore1007 ferm[15607]:  (warning).
Sep 12 18:09:48 labstore1007 systemd[1]: ferm.service: Main process exited, code=exited, status=255/EXCEPTION
Sep 12 18:09:48 labstore1007 systemd[1]: ferm.service: Failed with result 'exit-code'.
Sep 12 18:09:48 labstore1007 systemd[1]: Failed to start ferm firewall configuration.

But it seemed resolvable just now:

root@labstore1007:~# dig +short sagres.c3sl.ufpr.br
200.236.31.1

So I just ran puppet (just in case), and started ferm manually and it worked this time, probably just a temporary name resolution issue:

root@labstore1007:~# systemctl start ferm.service
root@labstore1007:~# systemctl status ferm.service
● ferm.service - ferm firewall configuration
   Loaded: loaded (/lib/systemd/system/ferm.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2022-09-13 07:24:11 UTC; 2s ago
  Process: 3484 ExecStart=/etc/init.d/ferm start (code=exited, status=0/SUCCESS)
 Main PID: 3484 (code=exited, status=0/SUCCESS)

Sep 13 07:24:09 labstore1007 systemd[1]: Starting ferm firewall configuration...
Sep 13 07:24:11 labstore1007 ferm[3484]: Starting Firewall: ferm.
Sep 13 07:24:11 labstore1007 systemd[1]: Started ferm firewall configuration.
dcaro renamed this task from SystemdUnitDownForLong labstore1007:9100 to SystemdUnitDownForLong labstore1007:9100 - ferm.service.Sep 13 2022, 7:27 AM