Wed, Dec 2
unfortunately i had to revert the other change in the chain :(
however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain
Tue, Dec 1
Seems like the errors in the console are somewhat expected. however I wonder if we can make this better somewhere in the apache/mod_auth_cas configuration
in fact observability is already tagged, @fgiunchedi wodner if this could be a more general issue?
Do you get this error on all expressions, a specific expression or spasmodically? have also tagged observability in case there is something other then CORS in play
Looking at the code i think it tries CORS, retries a few times and then falls back to no-cors which means that it eventually gets its self in a working state. This seems to be similar to what i observe as i don't ever get logged out or see an error in the GUI but i do see issues like the one reported above in the console
This looks like the issue we are hitting https://github.com/prymitive/karma/issues/1157
I have not been able to recreate this, is this still causing an issue?
Mon, Nov 30
The main issue here is that both puppet-lint and puppet syntax validate are too light weight to pick up issues like this as they are designed to work on individual files and as BooleanXXXXX* (e.g type BooleanXXXXX = Boolean) could be defined else where in the manifest its not obvious to theses tools if that's an error. I would say the standard way to pick up issues like this is to write a spec test for the class, see the example here, which does pick up the issue. however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain
Fri, Nov 27
Thu, Nov 26
Another use case for netbox data in puppet is exposing the network devices so they can be used in configuration such as icinga parent mapping and turnilo data augmentation. @ayounsi perhaps there are others in this space as well.
Wed, Nov 25
i have applied a fix from upstream and created a new debian package (1.2.0+git20160825.89.7fb22c8-3+deb10u2) i have installed this to pki2001 and now see AKI's in the database will test further tomorrow
No Ideas however i did notice that i got a 405 earlier today before failing over idp to idp2001. idp1001 was on an older software version so its possible there could have been a config mismatch. That aside i think the next thing to try would be to call the JS code manually via the console
Tue, Nov 24
This is working as expected thanks
Fri, Nov 20
for this one ~1 week would be nice however it wont really become a blocker for me untill we approch the end of the Q ~4weeks
Thu, Nov 19
The last open question is how much is worth to invest time in icinga-related stuff based on the current plans for alertmanager.
I spoke with Filippo and its expected that scheduled downtimes will continue in icinga for some time as such it probably is worth fixing. I have made a quick [incomplete] PS to add this, let me know what you think if you get a sec
could be related to https://phabricator.wikimedia.org/T267186
In considering this task more I think one complication is an SSL certificate for CN=puppet which is copied to every server so that it can be used to provide SSL Client authentication to the https://puppet:8140/ endpoint. Note this is not the Root certificate which has a CN of Puppet CA: palladium.eqiad.wmnet.
I have just pushed another change which i think should fix this are you able to test again
I think the best way forward here would be to run the script from one of the cumin hosts. Then we could use spicerack to query the icinga downtime status of a host. however Saying that i just took a quick look at the spicerack API and i couldn't see a function to get the current downtime status only functions to add and remove dowtime. tagging @Volans in case i missed something but also to ask if its worth adding the functions from this script to spicerack. My gut feeling is no this script is a bit to specific in the way it queries the puppetdb backend but perhaps not.
however i wonder if we should instead base the idp_primary/idp_failover parameters on a DNS lookup for dig CNAME idp.iwkimeida.org @ns0.wikimedia.org so there is only one source of truth?
We should definetly avoid qurying the auth server directly, otherwise puppet could fail over the system before all DNS cache's have expired. Querying a cache also has a race condition, one we can probably game to ensure we win, although its still not ideal
This has now been deployed to production and at the pre-flight test seems to be working. Are you able to test if the underlining issue is resolved
Wed, Nov 18
Tue, Nov 17
I have done some initial testing an i think we should just drop the ssl config from the puppetmaster backend servers and let it use the default. the backends dont ,ever do CA operations so they dont need to worry about that and the frontends have the CA dir rsynced
checking the following one the backends shows that the keys are all different which points to the puppet master process generating theses keys when it first needs them as appose to reciving them from the configuered puppet master
Tagging https://gerrit.wikimedia.org/r/c/operations/puppet/+/386666 as although its slightly different it seems to be around the same bit of code
Mon, Nov 16
in relation to deployment-cache-upload06 puppet runs successfully but the following command fails on each execution @Vgutierrez may be able to quickly spot the issue
This has been corrected also needed a few extra yaml settings
to fix deployment-logstash03:
- rename role::logstash::apifeatureusage to profile::logstash::apifeatureusage in horizon
- rename role::logstash::collector to profile::logstash::collector in horizon
- add modules/secret/secrets/certificates/kafka_logstash-eqiad_broker/truststore.jks to private repo
- add the following to the kafka_cluters global hiera value in the deployment-prep puppet project config
deployment-wdqs01 should be fixed with https://gerrit.wikimedia.org/r/641171 however there is now a conflict wit a local commit which needs deleting:
c503964991 [LOCAL HACK] Fill placeholder for etcd::autogen_pwd_seed from root@deployment-puppetmaster04:/var/lib/git/operations/puppet
deployment-kafka-* servers where all working fine when tested
Adding the following to match production has fixed deployment-ores01
I have added the following to the puppet config in horizon, this equates to the config which is in the current exim.conf file
the pre-flight check is now working on idp-test, ill now enable on production
Tue, Nov 10
I'm curious if this change made any difference? as i dont think it should (but there could be something with horizons backed which im not familiar with)