To get confd working in beta like it does in prod, the logic used to make hostnames safe needs to be made to match puppet so that our '-' characters are dealt with (it just happens that production hostnames don't contain this character). But that would require a confd version including the 'replace' template function
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +4 -4 | confd: move all prefix declarations to the files |
Event Timeline
Specifically version 0.11.0 or higher: https://github.com/kelseyhightower/confd/commit/27056b9389519e9f1ebf7244f2825a8e008082d6
Current version in our apt repo is 0.9.0-2
Mentioned in SAL (#wikimedia-operations) [2019-10-04T05:50:33Z] <_joe_> uploading confd 0.16.0 on stretch T147204
Mentioned in SAL (#wikimedia-operations) [2019-10-04T05:53:15Z] <_joe_> upgrading confd on puppetmaster1001 T147204
I had to roll back confd in reprepro and on puppetmaster1001 because I found a regression (or, a change in behaviour): the prefix key in the confd files isn't respected in confd 0.16.0 if a prefix is declared on the command line.
This makes sense - the previous behaviour was somewhat confusing to reason about, but this means we have to convert all of our confd files to a format compatible with both versions.
My line of thinking is as follows:
- Remove -prefix from the command line
- Add it back as default_prefix in confd::file, and prepend it to any prefix
- Leave everything else as-is
However, this stresses the necessity to check carefully all of our confd templates. I will do so and report the results on this ticket.
Change 540868 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] confd: move all prefix declarations to the files
Templates to verify:
- Redis replication
- Authdns
- Varnish
- Puppetmasters (config-master, mostly unused in production)
- dsh groups
Confirmed that redis::multidc_instance works as expected with the change of prefix, and the output is the same between the two versions.
Same for authdns - with the prefix change we obtain the same results with the two versions of confd.
For posterity, what I am doing to run these tests is as follows:
- Import etcd to a local daemon from a production backup (that will typically mean downloading the backup directory, running etcd migrate and then start etcd, given you're probably using etcd v3
- Make a tarball of the files in production, copy it locally to /some/dir
- unpack the tarball there, and create a directory tree relative to that position for all the directories confd will write to
- modify the destination in all confd files and remove check_cmd from them:
perl -i"" -pe 's#^dest = "#dest = "/some/dir#' etc/confd/conf.d/*.toml perl -i"" -pe 's/^(check|reload)_cmd.*$//' etc/confd/conf.d/*.toml
- Run the old confd binary with -prefix=/conftool/v1
./confd_0.9.0 -backend etcd -node=http://127.0.0.1:2379 -onetime=true -confdir=/some/dir/etc/confd -prefix /conftool/v1
- Modify the confd templates to include the whole prefix in the template file:
for file in etc/confd/conf.d/*.toml; do if grep -q ^prefix $file; then perl -i"" -pe 's#^prefix = "#prefix = "/conftool/v1#' $file; else echo 'prefix = "/conftool/v1"' >> $file; fi; done
- Run 0.9.0 again, without the prefix, and verify it doesn't change anything:
./confd_0.9.0 -backend etcd -node=http://127.0.0.1:2379 -onetime=true -confdir=/some/dir/etc/confd
- Run 0.16.0, verify it doesn't change anything
./confd_0.16.0 -backend etcd -node=http://127.0.0.1:2379 -onetime=true -confdir=/some/dir/etc/confd
Varnish caches are a special case, as they already don't declare a prefix on the command line, so they will need no modification and work out of the box, apparently.
Change 540868 merged by Giuseppe Lavagetto:
[operations/puppet@production] confd: move all prefix declarations to the files
Mentioned in SAL (#wikimedia-operations) [2019-10-16T10:48:34Z] <_joe_> upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run
All stretch+ servers in production have been updated to the newer version. Jessie hosts should go away soon.