However this depends heavily on the PS i had to revert as that takes care of dependencies configuring, hiera custom functions, the private repo and other things required to correctly compile the catalogue
Also worth noting that i did merge a similar change to this
Thu, May 6
Wed, May 5
thanks updated let me know if anything further is required
Tue, May 4
This has been completed
Pretty straigthforward, but please review, thanks.
Mon, May 3
to be extra usefull doing a reload of nsdc on bastion-restricted.wmflabs.org would also be usefull
Fri, Apr 30
Thu, Apr 29
Wed, Apr 28
This should be fixed now, please reopen if you are still seeing issues
I merged a couple of changes today, just a note that subprocess.Popen (and all the functions which inturn call this) dos not support the text=True option untill python 3.7 (buster) so we need to use universal_newlines=True instead for any scripts which need to run on debian < buster
. Please use puppet-dev as the name, if possible.
will update description
Tue, Apr 27
BTW, "Replication incident" was a good name when you took control- we didn't know what was going on at first, so when replication breaks is either the replicas or the primary server :-). We do now, so that is why I renamed it to not confuse it with a "regular" replication error problem.
@fgiunchedi there is a requirement to forward a subset of icinga alerts to a different set of users. either sending to an email address or something fancier like a push notifications.
This has been fixed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/682922. systemd-timer-mail-wrapper has been updated us command uses nagrs=argparse.REMAINDER. this means that the $command paramater needs to come last
Maybe the confusion was with our monitoring infrastructure(?), but that is the name of that mysql component.
The confusion was mine, i thought this fired as an icinga log, this can be ignored now
Mon, Apr 26
Thanks @Dzahn I have created a CR to update the supported types as the bastion host passes max_startups: 35:30:60 which should be supported. Ill take this forward tomorrow but feel free to merge the change your self if its blocking
Just to confirm after removing use-vlan-id re-imaging of sretest1002 worked fine
I have applied a patch which seems to have fixed this, going to resolve the ticket but please reopen if you still see issues
I have now increased the default expiry of certs, deployed the newest version of debmonitor-client and fixed systemd service logging. I'm going to optimistically assume this has fixed all issues and mark this resolved but please reopen if more issues arise
Failed to execute DebMonitor CLI: [SSL] PEM lib (_ssl.c:2947)
'PEM routines', 'get_name', 'no start line'), ('SSL routines', 'use_certificate_chain_file', 'PEM lib'
To confirm i have just pushed out 0.2.9 which should fix the JSONDecodeError and 'Retry' and 'int' issues.
Fri, Apr 23
The result was 1000+ DNS requests per agent run.
yes i also hit the same issue in $JOB~1 will try and find the relevant bugs and check on progress. As mentioned the pop caches will help here, further i have I have also drafted a change to start using systemd-resolved which could also help (although it was a Friday afternoon draft and needs much more testing and thought)
@akosiaris thanks for digging into this a bit further, and appolagise for not leaving more then a drive by comment:
Thu, Apr 22
I think this is just an old log entry from before i uploaded the new package (21/04/2021). debmonitor is installed into the docker image by docker-report so no need to rebuild the imags. I have just ran it my self manually and is mostly working although we i did get one error will check this out tomorrow
Wed, Apr 21
@Aklapper I think SRE are only tagged on this ticket in case there are any puppet changes to be made, reading the ticket im not sure there is so not sure if it makes sense to have sre tagged. As such im not the one who would be working on this and therefore can't really give an answer on priority. however i do notice that the request was made by yourself and a response given https://phabricator.wikimedia.org/T228591#6221461, has that response answered your request or are there still some specifics required.