Page MenuHomePhabricator

Puppet: test non stringified facts across the fleet
Closed, ResolvedPublic

Description

After having upgraded the fleet to facter version 2.4.6 in T166203, we can now test that disabling stringify_facts is actually a noop across the fleet.

The check can be performed with those steps:

  • disable puppet reliably (waiting for any in-flight run)
  • compile the catalog and output the facts to a directory
  • add the stringify_facts=false option to /etc/puppet/puppet.conf
  • compile the catalog again and output the fact to another directory
  • compare the result of the two runs
  • remove the stringify_facts option from puppet.conf
  • enable puppet
  • remove temporary files

Event Timeline

The command to run this across the fleet (skipping the hosts currently down) is:

sudo cumin -m async -d -b 4 'F:facterversion = "2.4.6" and not mw2140*' \
'disable-puppet "Testing stringify_facts - volans"' \
'puppet agent --onetime --no-daemonize --no-splay --ignorecache --no-usecacheonfailure --noop --vardir /root/__facter_upgrade_current__' \
'echo "stringify_facts = false" >> /etc/puppet/puppet.conf' \
'puppet agent --onetime --no-daemonize --no-splay --ignorecache --no-usecacheonfailure --noop --vardir /root/__facter_upgrade_new__' \
'sed -i "/^stringify_facts = false\$/d" /etc/puppet/puppet.conf' \
'enable-puppet "Testing stringify_facts - volans"' \
'diff -u <(jq . /root/__facter_upgrade_current__/client_data/catalog/$(hostname -f).json | grep -v "\"version\"") \
         <(jq . /root/__facter_upgrade_new__/client_data/catalog/$(hostname -f).json | grep -v "\"version\"") || true' \
'rm -rf /root/__facter_upgrade_current__ /root/__facter_upgrade_new__'

Mentioned in SAL (#wikimedia-operations) [2017-05-26T09:30:02Z] <volans> slowly testing if puppet stringify_facts=false is a noop across the fleet T166372

First diff found on scb1004:

(1) scb1004.eqiad.wmnet
----- OUTPUT -----
--- /dev/fd/63  2017-05-26 10:51:40.147441209 +0000
+++ /dev/fd/62  2017-05-26 10:51:40.147441209 +0000
@@ -1563,7 +1563,7 @@
             "stats-push": "statsd:graphite-in.eqiad.wmnet:8125,ores.scb1004.uwsgi",
             "memory-report": true
           },
-          "no_workers": "32",
+          "no_workers": 32,
           "has_spec": false,
           "repo": "ores/deploy",
           "firejail": true,

I'm keeping the compiled files there for now.

It seems expected to me, it is used through $::processorcount across different modules in puppet. And the reported diff is only in the parameters of the class.

All but two diffs are related to $::processorcount:

californium.wikimedia.org
-          "no_workers": "8",
+          "no_workers": 8,
...SNIP...
-              "processes": "8",
+              "processes": 8,

labsdb1007.eqiad.wmnet
-          "num_threads": "8",
+          "num_threads": 8,

ms-fe100[5-8].eqiad.wmnet, ms-fe200[5-8].codfw.wmnet
-          "num_workers": "16",
+          "num_workers": 16,

ms-fe300[1-2].esams.wmnet
-          "num_workers": "24",
+          "num_workers": 24,

scb200[1-6].codfw.wmnet
-          "no_workers": "16",
+          "no_workers": 16,

scb100[1-2].eqiad.wmnet
-          "no_workers": "24",
+          "no_workers": 24,


scb100[3-4].eqiad.wmnet
-          "no_workers": "32",
+          "no_workers": 32,

thumbor100[1-2].eqiad.wmnet
-          "instance_count": "40",
+          "instance_count": 40,

The only exception is a diff in the bash script generated from the erb template modules/base/templates/check_eth.erb, but is a space-only diff. I'm not sure why there is a one space difference between the stringified version and the non-stringified version. Anyway I'll send a patch do fix the useless spacing anyway:

labvirt1003.eqiad.wmnet
-for INTERFACE in br1102 eth0 eth1 eth2 eth3 lo                                                                  ; do
+for INTERFACE in br1102 eth0 eth1 eth2 eth3 lo                                                                 ; do

labvirt1011.eqiad.wmnet
-for INTERFACE in br1102 eth0 eth1 eth2 eth3 lo                                                                       ; do
+for INTERFACE in br1102 eth0 eth1 eth2 eth3 lo                                                                        ; do

Change 355896 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Monitoring: remove spaces from list of interfaces

https://gerrit.wikimedia.org/r/355896

Change 355896 merged by Alexandros Kosiaris:
[operations/puppet@production] Monitoring: remove spaces from list of interfaces

https://gerrit.wikimedia.org/r/355896

After the above was merged now all the labvirt* instances have no diff, hence all the differences are just the string vs. integer of the $::processorcount as class parameter.

Change 356030 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] raid: switch from stringified fact to array

https://gerrit.wikimedia.org/r/356030

Change 356031 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] Remove str2bool from is_virtual facts

https://gerrit.wikimedia.org/r/356031

Change 356032 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] Remove to_i/Integer from now unstringified facts

https://gerrit.wikimedia.org/r/356032

Change 356062 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Puppet: disable stringified facts in prod

https://gerrit.wikimedia.org/r/356062

@akosiaris @Joe @faidon
I've changed to stringify_facts = false my labs project and this are the different facts. Bare in mind that with the v3 of the PuppetDB API the facts are still reported "stringified", in the sense that they have a value key that is a string, that now is a JSON-encoded strings.
Here below are the diffs of the value property:

  • system_uptime:
# Stringified
{"seconds"=>15896439, "hours"=>4415, "days"=>183, "uptime"=>"183 days"}

# Non-stringified
{"days":183,"hours":4415,"seconds":15896897,"uptime":"183 days"}
  • processors:
# Stringified
{"models"=>["Intel Xeon E312xx (Sandy Bridge)"], "count"=>1, "physicalcount"=>1}

# Non-stringified
{"count":1,"models":["Intel Xeon E312xx (Sandy Bridge)"],"physicalcount":1}
  • os:
# Stringified
{"name"=>"Debian", "family"=>"Debian", "release"=>{"major"=>"8", "minor"=>"6", "full"=>"8.6"}, "lsb"=>{"distcodename"=>"jessie", "distid"=>"Debian", "distdescription"=>"Debian GNU/Linux 8.6 (jessie)", "distrelease"=>"8.6", "majdistrelease"=>"8", "minordistrelease"=>"6"}}

# Non-stringified
{"family":"Debian","lsb":{"distcodename":"jessie","distdescription":"Debian GNU/Linux 8.6 (jessie)","distid":"Debian","distrelease":"8.6","majdistrelease":"8","minordistrelease":"6"},"name":"Debian","release":{"full":"8.6","major":"8","minor":"6"}}
  • partitions:
# Stringified
{"vda1"=>{"size"=>"2015", "label"=>"primary"}, "vda2"=>{"uuid"=>"e054bbb6-138c-4dab-ace7-b8e8e8292ab5", "size"=>"1046528", "label"=>"swap", "filesystem"=>"swap"}, "vda3"=>{"uuid"=>"2be498f4-baa6-4958-96b5-511f698a3eba", "size"=>"39845854", "mount"=>"/", "label"=>"root", "filesystem"=>"ext4"}, "vda4"=>{"size"=>"1046528", "label"=>"primary", "filesystem"=>"LVM2_member"}}

# Non-stringified
{"vda1":{"label":"primary","size":"2015"},"vda2":{"filesystem":"swap","label":"swap","size":"1046528","uuid":"e054bbb6-138c-4dab-ace7-b8e8e8292ab5"},"vda3":{"filesystem":"ext4","label":"root","mount":"/","size":"39845854","uuid":"2be498f4-baa6-4958-96b5-511f698a3eba"},"vda4":{"filesystem":"LVM2_member","label":"primary","size":"1046528"}}

Then a few diffs that are labs-only:

  • I now have an ec2_metadata fact that I was not getting before
  • Getting the facts from puppetdb right after the first run after setting stringify_facts = false I got 17 additional facts ec2_* that then disappeared after the second and subsequent puppet runs.

Then a few diffs that are labs-only:

  • I now have an ec2_metadata fact that I was not getting before
  • Getting the facts from puppetdb right after the first run after setting stringify_facts = false I got 17 additional facts ec2_* that then disappeared after the second and subsequent puppet runs.

To answer myself, seems that all the ec2_* facts are back, it might be related to running puppet manually vs. via crontab.

Change 356062 merged by Volans:
[operations/puppet@production] Puppet: disable stringified facts in prod

https://gerrit.wikimedia.org/r/356062

Mentioned in SAL (#wikimedia-operations) [2017-05-31T12:04:04Z] <volans> merged stringify_facts=false for production hosts T166372

Change 356031 merged by Faidon Liambotis:
[operations/puppet@production] Remove str2bool from is_virtual facts

https://gerrit.wikimedia.org/r/356031

Change 356030 merged by Faidon Liambotis:
[operations/puppet@production] raid: switch from stringified fact to array

https://gerrit.wikimedia.org/r/356030

Structured facts were enabled across both production and Labs realms and seem to have been working fine :) We even converted a few facts of ours in structured facts (raid, lldp etc.), so we can call this a success.

Change 356032 merged by Alexandros Kosiaris:
[operations/puppet@production] Remove to_i/Integer from now unstringified facts

https://gerrit.wikimedia.org/r/356032