Page MenuHomePhabricator

ferm broken in stretch
Closed, ResolvedPublic

Description

ferm in stretch now ships a systemd unit. That's great, but it breaks any rule using @resolve (i.e. any host since 10_prometheus-node-exporter is everywhere):

The unit uses

Wants=network-pre.target
Before=network-pre.target shutdown.target

network-pre.target is explicitly defined to run prior to network setup:

This passive target unit may be pulled in by services that want to run before any network is set up, for example for the purpose of setting up a firewall. All network management software orders itself after this target, but does not pull it in.

The correct fix would be to setup an early service which sets up the defaultrules and a second service which has "Wants: nss-lookup.target" which parses the locally configured rules.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 31 2017, 11:17 AM

Should this be reported Upstream and fixed there or is this a WMF-only problem?

It's a Debian-specific problem,not a WMF-specific one. I'm currently writing up a bug report for the Debian BTS.

faidon added a subscriber: faidon.May 31 2017, 11:45 AM

Ouch. I would argue that we should probably get rid of those @resolve calls in most (if not all) of those cases, as they are problematic in general: they can often result into unpredictable behavior, as well as DNS glitches cascading in applying no firewalls. However, a quick grep shows that we have 106 of those in our puppet tree right now so that's probably infeasible in a reasonable timeframe :/

The following patch should fix this (it works fine for me on ms-be2001, which is currently running a rebuilt package):
https://phabricator.wikimedia.org/P5534

In addition I also tried to create a fix which adds a second configuration hierarchy (so that it allows the configuration of base rules which are independant of name lookups and then adding the other rules on top in a second unit:
https://phabricator.wikimedia.org/P5535

Unfortunately that does not work as expected, the second ferm execution done by ferm-nss overrides the previously configured rules and I haven't found a way to bypass that. I think it's fine to switch to P5534 for now and work with upstream to move towards a solution which supports both config hierarchies in the future.

Mentioned in SAL (#wikimedia-operations) [2017-06-06T11:16:58Z] <moritzm> uploaded ferm 2.3.2+wmf1 to apt.wikimedia.org/stretch-wikimedia (T166653)

I've uploaded ferm 2.3-2+wmf1 to stretch-wikimedia which unbreaks ferm by waiting on nss-lookup.target. This makes ferm start 1-1.5 seconds later than the default stretch unit using network-pre.target, which isn't great, but it's also in line what we've been using on trusty/jessie so far. One possibility to remove that time window would be to add a separate ferm-pre.service with configures a default policy (what we currently ship via 00_main), which gets loaded in network-pre and then replaced by ferm.service with a different config file/directory upon reaching nss-lookup. Those changes are ideally made in a generic manner via the Debian bug, though: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863802

MoritzMuehlenhoff closed this task as Resolved.Jun 13 2017, 11:20 AM
MoritzMuehlenhoff claimed this task.

This is fixed in the stretch-wikimedia package.