Update confd package
To get confd working in beta like it does in prod, the logic used to make hostnames safe needs to be made to match puppet so that our '-' characters are dealt with (it just happens that production hostnames don't contain this character). But that would require a confd version including the 'replace' template function

Joe raised the priority of this task from Low to Medium.
Joe added a project: serviceops.

Mentioned in SAL (#wikimedia-operations) [2019-10-04T05:50:33Z] <_joe_> uploading confd 0.16.0 on stretch T147204

Mentioned in SAL (#wikimedia-operations) [2019-10-04T05:53:15Z] <_joe_> upgrading confd on puppetmaster1001 T147204

I had to roll back confd in reprepro and on puppetmaster1001 because I found a regression (or, a change in behaviour): the prefix key in the confd files isn't respected in confd 0.16.0 if a prefix is declared on the command line.

This makes sense - the previous behaviour was somewhat confusing to reason about, but this means we have to convert all of our confd files to a format compatible with both versions.

My line of thinking is as follows:

  • Remove -prefix from the command line
  • Add it back as default_prefix in confd::file, and prepend it to any prefix
  • Leave everything else as-is

However, this stresses the necessity to check carefully all of our confd templates. I will do so and report the results on this ticket.

Change 540868 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] confd: move all prefix declarations to the files

Templates to verify:

  • Redis replication
  • Authdns
  • Varnish
  • Puppetmasters (config-master, mostly unused in production)
  • dsh groups

Confirmed that redis::multidc_instance works as expected with the change of prefix, and the output is the same between the two versions.

Same for authdns - with the prefix change we obtain the same results with the two versions of confd.

For posterity, what I am doing to run these tests is as follows:

  • Import etcd to a local daemon from a production backup (that will typically mean downloading the backup directory, running etcd migrate and then start etcd, given you're probably using etcd v3
  • Make a tarball of the files in production, copy it locally to /some/dir
  • unpack the tarball there, and create a directory tree relative to that position for all the directories confd will write to
  • modify the destination in all confd files and remove check_cmd from them:
perl -i"" -pe 's#^dest = "#dest = "/some/dir#' etc/confd/conf.d/*.toml
perl -i"" -pe 's/^(check|reload)_cmd.*$//' etc/confd/conf.d/*.toml
  • Run the old confd binary with -prefix=/conftool/v1
./confd_0.9.0 -backend etcd -node=  -onetime=true -confdir=/some/dir/etc/confd -prefix /conftool/v1
  • Modify the confd templates to include the whole prefix in the template file:
for file in etc/confd/conf.d/*.toml; do if grep -q ^prefix $file; then perl -i"" -pe 's#^prefix = "#prefix = "/conftool/v1#' $file; else echo 'prefix = "/conftool/v1"' >> $file; fi; done
  • Run 0.9.0 again, without the prefix, and verify it doesn't change anything:
./confd_0.9.0 -backend etcd -node=  -onetime=true -confdir=/some/dir/etc/confd
  • Run 0.16.0, verify it doesn't change anything
./confd_0.16.0 -backend etcd -node=  -onetime=true -confdir=/some/dir/etc/confd

Varnish caches are a special case, as they already don't declare a prefix on the command line, so they will need no modification and work out of the box, apparently.

Change 540868 merged by Giuseppe Lavagetto:
[operations/puppet@production] confd: move all prefix declarations to the files

Mentioned in SAL (#wikimedia-operations) [2019-10-16T10:48:34Z] <_joe_> upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run

All stretch+ servers in production have been updated to the newer version. Jessie hosts should go away soon.