Page MenuHomePhabricator

confd setup left without configuration doesn't stop confd
Open, MediumPublic

Description

I've found by chance that we have 34 hosts where confd doesn't have anymore any configuration file or templates but the process is still running and logs every 3 seconds:

/usr/bin/confd[3971103]: WARNING Found no templates

The 34 hosts with that setup are:

arclamp2001.codfw.wmnet,arclamp1001.eqiad.wmnet,cloudgw[2002-2003]-dev.codfw.wmnet,cloudgw[1001-1002].eqiad.wmnet,cloudlb[2001-2003]-dev.codfw.wmnet,cuminunpriv1001.eqiad.wmnet,ganeti[2033-2034].codfw.wmnet,ganeti-test[2001-2003].codfw.wmnet,idm-test1001.wikimedia.org,moscovium.eqiad.wmnet,netbox-dev2002.codfw.wmnet,netboxdb2002.codfw.wmnet,netboxdb1002.eqiad.wmnet,netmon[1003,2002].wikimedia.org,people2003.codfw.wmnet,people1004.eqiad.wmnet,pybal-test2003.codfw.wmnet,sretest[2003-2005].codfw.wmnet,sretest[1002-1003].eqiad.wmnet,testreduce1002.eqiad.wmnet,testvm[2002,2004-2005].codfw.wmnet

Maybe we should do somehing on the puppet side to ensure confd is stopped if no config/template is present:

/etc/confd
├── conf.d
└── templates

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2024-02-07T19:19:52Z] <mutante> people1004 systemctl stop confd; running puppet; checking to remove confd remnants from people* hosts - T356296

Seems to me this has to do with the profile::firewall migration from iptables to nftables.

What these hosts have in common is profile::firewall::provider: nftables in hieradata.

And confd is pulled in from profile::firewall for request-ipblocks/abuse

From inside profile::firewall:

    case $provider {
        'ferm': {
            if $defs_from_etcd {
                # unmanaged files under /etc/ferm/conf.d are purged
                # so we define the file to stop it being deleted
                file { '/etc/ferm/conf.d/00_defs_requestctl':
                    ensure => file,
..

        'nftables': {
..
            if $defs_from_etcd and $defs_from_etcd_nft {
                confd::file { '/etc/nftables/sets/requestctl.nft':

See what happens when I pretend to change the firewall provider back to ferm for people hosts:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/998532/4/hieradata/role/common/microsites/peopleweb.yaml

-->

https://puppet-compiler.wmflabs.org/output/998532/1334/people1004.eqiad.wmnet/index.html

Mentioned in SAL (#wikimedia-operations) [2024-06-26T23:26:07Z] <mutante> people1004 - stopped confd which logs every 3 seconds that it can't find any templates (T356296)

Change #1050080 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: set profile::firewall::defs_from_etcd to false

https://gerrit.wikimedia.org/r/1050080

Change #1050080 abandoned by Dzahn:

[operations/puppet@production] peopleweb: set profile::firewall::defs_from_etcd to false

Reason:

per Moritz' comment

https://gerrit.wikimedia.org/r/1050080

I'm a little confused about this one. We have defs_from_etcd_nft set to false by default in heria for the firewall profile:

cmooney@wikilap:~/repos/puppet$ grep defs_from_etcd_nft hieradata/common/profile/firewall.yaml 
profile::firewall::defs_from_etcd_nft: false

Due to a separate issue we have which causes nftables to fail completely if this is set to true (due to mixed IPv4 and IPv6 networks being included if it is, which isn't compatible with how the rules are defined), we actually need to make sure this is not evaluated to true for any production hosts right now.

Taking arclamp2001 as an example I don't see the /etc/nftables/sets/requestctl.nft file, which would be created if defs_from_etcd_nft was true. But at the same time I do see that confd.service is defined and is running (and logging the warnings). Perhaps a change was made but our current puppet config doesn't properly remove the confd service?

@cmooney

In profile::firewall there is a if $defs_from_etcd and $defs_from_etcd_nft. So if both are true that installs confd::file { '/etc/nftables/sets/requestctl.nft': as you say.

There is also another if $defs_from_etcd { which installs confd::file { '/etc/ferm/conf.d/00_defs_requestctl':. and is true by default. This part is outside the case $provider stanza but has a ensure => stdlib::ensure($provider == 'ferm'),.

What I think happens is that this second confd::file here pulls in confd, regardless of what the provider is set to and then.. since the provider is now not ferm anymore, this file /etc/ferm/conf.d/00_defs_requestctl gets absented.

But merely absenting a confd file does not mean the confd service and package gets removed.

So the result would be what we see, none of the requestctl files exist but confd is still there and without config.

Yea, so this:

if $defs_from_etcd {
    confd::file { '/etc/ferm/conf.d/00_defs_requestctl':
        ensure          => stdlib::ensure($provider == 'ferm'),

$defs_from_etcd is true.. (from common/profile/firewall.yaml) ..so confd gets pulled.

But the provider is not ferm, so the config file gets removed.

But there is nothing in confd::file that would remove confd the service. All it does is remove the config file.

Change #1057264 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] firewal: if provider is nft and not pulling requestctl, remove confd

https://gerrit.wikimedia.org/r/1057264

This is an attempt to fix it per logic "if the provider is nft and we do NOT pull requestctl data.. THEN ... remove confd".

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1057264

puppet compiler shows on arclamp2001 it would remove confd:

https://puppet-compiler.wmflabs.org/output/1057264/3431/arclamp2001.codfw.wmnet/index.html

Here is what happens when compiling it on all of profile::firewall:

https://puppet-compiler.wmflabs.org/output/1057264/3432/ (still running)

Change #1057264 abandoned by Dzahn:

[operations/puppet@production] firewall: if provider is nft and not pulling requestctl, remove confd

https://gerrit.wikimedia.org/r/1057264

JMeybohm subscribed.

This came up during T374366: Race condition in iptables rules during puppet runs on k8s nodes - currently it is not possible to disable defs_from_etcd in a clean way.