Fri, Aug 16
Thu, Aug 15
Wed, Aug 14
Fri, Aug 9
When this alerted today and ended up going right down the rabbit hole. Anyway i think i have found out some information.
Thu, Aug 8
The lua hack seems to have worked. I have again updated the config to send some canary servers to puppetmaster1003. So far i have seen no errors in the puppetdb log. Further via puppetdb i can see that the puppet reports and facts are all refreshing.
Wed, Aug 7
I have disabled puppetmaster1003 for now, unfortunately from reading the PUP-8901 It seems the advice from puppetlabs is to always upgrade all puppetmasters and the puppetdb in tandem
It seems when that the severs using the new puppet master cause the following stack trace when they try reach the 'store report' phase
Tue, Aug 6
Mon, Aug 5
@EBernhardson im not sure where the data above came from but i suspect a typo somewhere 2620:0:860::ed1a::1 dose not exist anywhere in the puppet repo. however 2620:0:860:ed1a::1 (notice the missing ':') is valid and is routed to the lvs eqiad servers. That said it does look like most lvs services are in the 2620:0:860:ed1a:: prefix an the servers i have found in the 2620:0:861:1:: prefix seem to be real machines so i suspect that the ipv6 address may be incorrect
Thanks brandon, Ill take a look at removing the ones SLAAC addresses from puppet this week. One of them, at least, was added by me and was what led me down this rabbit hole :)
Fri, Aug 2
Our current set up seems to use SLACC without any of the privacy extensions as such the lower 64 bits of the ipv6 address are composed of the mac address of the interface. It seems this would be fairly simple to script the AAAA records and then im not sure if we need to do much configuration. Although the conversion then turns to what do we do with all the hosts that currently have interface::add_ip6_mapped
Answering my own question, as NIC's can and do change in a server lifetime using SLAAC like this is undesirable as we dont want an IP change if we change a NIC
Thu, Aug 1
I think this is a really good idea. further after a bit of investigation i think any arbitrary string can be used. Later versions of the puppet documentation even use git rev-parse as an example command. also r10k has a script in there repo which has the sha1, git server and last git log message as the version
Sorry for the interuption, this issues ws caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/527064 when i removed the config entry at the common level. I have now added a default of 0 canary_hosts in labs.yaml and comon/profile/puppetmaster/frontend.yaml. Please reopen if you still see issues
Wed, Jul 31
Have a script that periodically generates a hiera file with that information for each host, to be merged by the puppetmasters with the public tree as we already do with the private repo
Tue, Jul 30
Thu, Jul 25
In case this needs rolling back the the issue can be fixed in 4.8 with the following patch https://phabricator.wikimedia.org/P8772
Mon, Jul 22
yes, closing thanks
puppetmaster1003 has now been build and is running puppet 5.5
Jul 19 2019
Here is another example. puppetmaster1003.eqiad.wmnet is in both 'Hosts that have no differences' and 'Hosts that fail to compile when the change is applied'
Jul 18 2019
This error is triggered slower on some machines vs others
- cp2026 (cache::upload): takes a bit longer to die ~ 5 minutes
- cp1085 (cache::text): dies instantly
- cp1075 (cache::text): dies instantly
- cp1076 (cache::upload): dies instantly
Unfortunately this had to be downgraded again. We saw the following error with the new version
I have now pushed a patch so so that mtail uses -logs /dev/stdin instead of -logfds 0. would you like me to upgrade everything to mtail_3.0.0~rc24.1-1+wmf1_amd64.deb again?
below are all the hosts with puppetdb package installed
Jul 17 2019
Investigating further this is due to how populate_puppetdb adds entries to the database. p`opulate_puppetdb` loops through the site.pp manifest picks one host from every role and compiles its catalog locally so that the result is sent to the puppetdb. This means that the puppet DB is populated and the other calls to query_resource have enough data for them succeed.
during my research i noticed that puppet db was failing with the following error
Jul 16 2019
This was broken by https://gerrit.wikimedia.org/r/c/operations/puppet/+/522101 which added two new custom empty backends. the error message is coming from the fact that hiera v3 requires 5 parameters for its lookup function and hiera v1 4. I'm not sure what triggered but it is likely due in some way to the fact that the other wmcs backends use the V1 api.
Jul 15 2019
Jul 12 2019
ahh ok so in that case \\! and \! are the same. As ! is not an escape character the '\' is treated treated as a literal. I created a small puppet script to test this (see below) and also added test cases to the spec file
the lookup function allows one to call it in multiple ways e.g.
being as its a Friday i thought i would have a go at this however i didn't read the original message correctly and before i refactor i want to double check that you want device\\! and not device\!. I would have thought the first form would fail but i don't know how many levels of encoding/escaping are needed?
Jul 11 2019
It seems this flag was removed from upstream just over a year ago. i have pinged fgiunchedi original issue requesting this feature to ask if there is more context around why it got removed.
Jul 10 2019
@Volans and I discussed some changes via IRC which have now been applied. As there has been no further activity on this im gonna close it but feel free to reopen if something has been missed or overlooked
Jul 9 2019
@sbassett thanks for the info
@chasemp I have gone ahead and created the secteam-users group (renamed from secteam to match our convention) so you can go ahead an start using that one. There where some additional comments around the secteam-admins group and as this is a downgrade in your permissions it was deemed that this change was low priority and we had some time to ensure we get this right. Please let us know if we have got the priority wrong.
Jul 5 2019
Jul 3 2019
Jul 2 2019
at least one more https://gerrit.wikimedia.org/r/c/operations/puppet/+/519227
i looked into this a bit and i *think* the debug messages which mention cn=wmf,ou=groups,dc=wikimedia,dc=org can be ignored as we see the following in the logs as well
If anyone can review ttps://gerrit.wikimedia.org/r/c/operations/puppet/+/519227 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/519218 i think we should be able to merge and close this task?
Jul 1 2019
@MoritzMuehlenhoff The updated package has broken some dependencies which is causing an error on phab1003
The update has now been rolled out.
One thing to consider for buster: So far we've used the facter version in buster, so I think we have two options:
please go ahead
Jun 28 2019
Jun 27 2019
this should be fixed now reopen if more is needed or ping me on irc
great, i think this is done now so closing please re open if there is still an issue
@Varnent I have now renamed the old list and it is now available at https://lists.wikimedia.org/mailman/listinfo/MoveCom. The old URL should redirect to this one and the old email address has been configured as an alias. Please let me know if there are any issues