In some cases ferm fails to start at boot because of some failed resolution, as an example:
Oct 18 15:53:04 db2042 ferm[837]: DNS query for 'prometheus2003.codfw.wmnet' failed: query timed out Oct 18 15:53:04 db2042 ferm[837]: (warning). Oct 18 15:53:04 db2042 systemd[1]: ferm.service: Main process exited, code=exited, status=255/n/a Oct 18 15:53:04 db2042 systemd[1]: Failed to start ferm firewall configuration. Oct 18 15:53:04 db2042 systemd[1]: ferm.service: Unit entered failed state.
This was on db2042 right after a reboot, but I've already seen this happening on other hosts too.
The subsequent puppet run didn't start the service either, although a simple start would fix the issue.
So we should investigate our puppet+systemd integration for ferm to make sure that ferm it's able to resolve hosts in its configuration when it starts and on failure (at least @reboot) it should be retried a couple of times and/or have puppet ensure that's it's running starting the service.