Feb 29 2020
Feb 28 2020
@aaron Please have a look at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569541/ and let us know if it reflects what we are going for, or if there is some nitpicking to be done. We can afterwards merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/574200/ to test routes for the /*/wm-wan/ keys.
This is stalled until we completely stop using redis and we have put the gutter pool in production
Great we merged this patch! Do we have a plan of how we will communicate this to the deployers when we release scap as well as how to test that it is all good in production? Thank you!
Feb 27 2020
Feb 26 2020
(Reopening to discuss this a bit more)
Feb 25 2020
@aaron @Krinkle Please let us know if the configuration and the information on this task are enough to proceed with testing the gutter proxies for /*/wm-wan/ or if there is anything that needs further investigation/explanation.
If there are no objections, I would like to proceed with this
Feb 24 2020
Let me know if we come across any issues:)
Feb 21 2020
With a little bit more fiddling, I managed to run puppet on ssh deployment-mediawiki-09.deployment-prep.eqiad.wmflabs! @herron does that unblock you?
I have uploaded a patch which I manually tried on beta, this seems to work, but sadly, puppet breaks a bit further down the road
Package has been build and pushed to wikimedia repo, I'll roll it to production on Monday
@herron any ideas how to proceed here? Is there someone who can help? Apparently this patch could potentially break beta.
Feb 19 2020
It appears that on beta the variable $server_role = $::_role.split('/')[-1 is not evaluated properly, while in production, it looks just fine https://puppet-compiler.wmflabs.org/compiler1001/20912/mwdebug1001.eqiad.wmnet/, I believe this has something to do with this server's role in beta?
To proceed with testing, we will puppetise the following configuration, and roll it to a couple of canary servers, and block traffic towards mc* like we did earlier. Our goal is to test memcached 1.5.x on debian Buster, since that is what the gutter pool is running
Weights of mw1349-mw1355 were switched to 30
Feb 17 2020
Test if failover works and when to failover
Thank you all!
Feb 14 2020
Oh great, thanks! From https://logstash-beta.wmflabs.org/, it looks that logs are flowing, so maybe we can try the change there after all
Feb 13 2020
thank you daniel!
@herron I fiddled a bit on beta, it appears that for some reason, nothing is being streamed there since today, I am not sure if I broke it myself while I was trying to make it work :/ FWIW, I have restored the original config on deployment-logstash03
I have uploaded a patch that could possibly work, my issue generally is that I can't find a sane and safe way to test if those logstash filters will do what we need. @herron any ideas are welcome
Feb 12 2020
@elukey thank you for unblocking this !!!
Feb 11 2020
@CGlenn it appears you are already in the ldap wmf group as 'keepit-ssh', which is the username (shell name) in wikitech as well. Can you check please if your access is in order?
will fix :)
Feb 10 2020
On mwdebug1001 we have deployed a config similar to T240684#5826966, where we failover in case of a TKO to the gutter pool servers (mc-gp*). We again have set --timeouts-until-tko=3.
@kzimmerman oh my bad, sorry!