We should create an Icinga check to detect failing ferm restarts after a puppet run:
The "notrack" ferm rules in puppet were broken (adding them to the incorrect table). When the change was applied to helium it was noticed due to the pool counter no longer working (caused by an overflown connection table). But on the poolcounters running in codfw, the broken change already caused a non-working ferm restart after a earlier puppet run (e.g. in logged in syslog on Aug 4 10:24:57).
While the notrack failures errors were introduced during the initial ferm setup of a host, such errors may also be caused in day-to-day operation, e.g. if resolve() fails (we had that with the list of snapshot dump mirrors before).