Page MenuHomePhabricator

jbond (John Bond)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Jan 7 2019, 1:06 PM (99 w, 4 d)
Availability
Available
IRC Nick
jbond42
LDAP User
Jbond
MediaWiki User
JBond (WMF) [ Global Accounts ]

Recent Activity

Wed, Dec 2

jbond updated the language for P13519 (An Untitled Masterwork) from autodetect to shell.
Wed, Dec 2, 6:42 PM
jbond created P13519 (An Untitled Masterwork).
Wed, Dec 2, 6:42 PM
jbond added a comment to T261693: Ensure Puppet checks types as part of the build.

Interesting, thank you! It's good to have that on the profile class.

unfortunately i had to revert the other change in the chain :(

Wed, Dec 2, 4:00 PM · Patch-For-Review, puppet-compiler, Puppet, Operations
jbond created P13517 (An Untitled Masterwork).
Wed, Dec 2, 2:34 PM
jbond added a comment to T261693: Ensure Puppet checks types as part of the build.

however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain

Wed, Dec 2, 11:45 AM · Patch-For-Review, puppet-compiler, Puppet, Operations
jbond added a comment to T229397: Puppet: get row/rack info from Netbox.

I think we could probably include all devices under something like

netbox::network_devices:
  fqdn:
    * => $metadata
Wed, Dec 2, 11:15 AM · observability, User-crusnov, User-jbond, Patch-For-Review, Puppet, Operations
jbond added a comment to T268802: Manage frack switches with Netbox.

Another useful thing would be to duplicate the production "networking" puppet fact, to have LLDP, IPs, etc. exposed there as well.

Wed, Dec 2, 11:10 AM · Operations, netops, fundraising-tech-ops
jbond created P13514 (An Untitled Masterwork).
Wed, Dec 2, 9:58 AM
jbond added a comment to T268974: systemd.timer not executing on cumin2001 after command was modified.

The previous patch did not have an effect- backups continue to be executed on eqiad, but codfw ones (cumin2001) do not run at all- they are not even attempted, according to journalctl.

Is there a way to "kick"/reload a timer (daemon-reload seems to have no effect).

Wed, Dec 2, 8:56 AM · Patch-For-Review, Puppet, Operations

Tue, Dec 1

jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

Seems like the errors in the console are somewhat expected. however I wonder if we can make this better somewhere in the apache/mod_auth_cas configuration

Tue, Dec 1, 12:55 PM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond updated subscribers of T268233: thanos u/i gives errors if left idle for a few hours.

in fact observability is already tagged, @fgiunchedi wodner if this could be a more general issue?

Tue, Dec 1, 11:49 AM · CAS-SSO, observability, Operations
jbond added a comment to T268233: thanos u/i gives errors if left idle for a few hours.

Do you get this error on all expressions, a specific expression or spasmodically? have also tagged observability in case there is something other then CORS in play

Tue, Dec 1, 11:46 AM · CAS-SSO, observability, Operations
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

Looking at the code i think it tries CORS, retries a few times and then falls back to no-cors which means that it eventually gets its self in a working state. This seems to be similar to what i observe as i don't ever get logged out or see an error in the GUI but i do see issues like the one reported above in the console

Tue, Dec 1, 11:07 AM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

This looks like the issue we are hitting https://github.com/prymitive/karma/issues/1157

Tue, Dec 1, 10:56 AM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond added a comment to T268233: thanos u/i gives errors if left idle for a few hours.

I have not been able to recreate this, is this still causing an issue?

Tue, Dec 1, 10:33 AM · CAS-SSO, observability, Operations
jbond updated the task description for T268978: String vs Binary issues while running the puppet compiler.
Tue, Dec 1, 10:03 AM · Operations, puppet-compiler

Mon, Nov 30

jbond added a comment to T261693: Ensure Puppet checks types as part of the build.

The main issue here is that both puppet-lint and puppet syntax validate are too light weight to pick up issues like this as they are designed to work on individual files and as BooleanXXXXX* (e.g type BooleanXXXXX = Boolean) could be defined else where in the manifest its not obvious to theses tools if that's an error. I would say the standard way to pick up issues like this is to write a spec test for the class, see the example here, which does pick up the issue. however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain

Mon, Nov 30, 7:18 PM · Patch-For-Review, puppet-compiler, Puppet, Operations
jbond triaged T269000: thanos: 404 error trying to fetch js library as Low priority.
Mon, Nov 30, 2:17 PM · Operations, observability
jbond closed T268978: String vs Binary issues while running the puppet compiler as Resolved.

This looks like it was an artefact of the python2 -> python3 migration. I have pushed the PS that drops all the hacks added to support both python versions and this is running correctly now.

Mon, Nov 30, 11:09 AM · Operations, puppet-compiler

Fri, Nov 27

jbond committed rLPRI543eb9b2d053: pki: move root cert (authored by jbond).
pki: move root cert
Fri, Nov 27, 1:32 PM
jbond committed rLPRI50620f753e1b: add fake priv key (authored by jbond).
add fake priv key
Fri, Nov 27, 1:20 PM
jbond updated the task description for T268882: PKI/CFSSL Next steps.
Fri, Nov 27, 12:42 PM · Patch-For-Review, User-jbond, Operations
jbond updated the task description for T268882: PKI/CFSSL Next steps.
Fri, Nov 27, 10:51 AM · Patch-For-Review, User-jbond, Operations
jbond closed T268775: CFSSL: certdb not populating AKI making revokes impossible as Resolved.
Fri, Nov 27, 10:50 AM · User-jbond, Operations
jbond closed T268775: CFSSL: certdb not populating AKI making revokes impossible, a subtask of T268882: PKI/CFSSL Next steps, as Resolved.
Fri, Nov 27, 10:50 AM · Patch-For-Review, User-jbond, Operations
jbond added a subtask for T268882: PKI/CFSSL Next steps: T268775: CFSSL: certdb not populating AKI making revokes impossible.
Fri, Nov 27, 10:50 AM · Patch-For-Review, User-jbond, Operations
jbond added a parent task for T268775: CFSSL: certdb not populating AKI making revokes impossible: T268882: PKI/CFSSL Next steps.
Fri, Nov 27, 10:50 AM · User-jbond, Operations
jbond triaged T268882: PKI/CFSSL Next steps as Medium priority.
Fri, Nov 27, 10:49 AM · Patch-For-Review, User-jbond, Operations

Thu, Nov 26

jbond added a comment to T229397: Puppet: get row/rack info from Netbox.

Another use case for netbox data in puppet is exposing the network devices so they can be used in configuration such as icinga parent mapping and turnilo data augmentation. @ayounsi perhaps there are others in this space as well.

Thu, Nov 26, 10:14 AM · observability, User-crusnov, User-jbond, Patch-For-Review, Puppet, Operations

Wed, Nov 25

jbond renamed T268775: CFSSL: certdb not populating AKI making revokes impossible from CFSSL: certdb notpopulating AKI making revokes not possible via the api to CFSSL: certdb not populating AKI making revokes impossible.
Wed, Nov 25, 6:48 PM · User-jbond, Operations
jbond added a comment to T268775: CFSSL: certdb not populating AKI making revokes impossible.

i have applied a fix from upstream and created a new debian package (1.2.0+git20160825.89.7fb22c8-3+deb10u2) i have installed this to pki2001 and now see AKI's in the database will test further tomorrow

Wed, Nov 25, 6:47 PM · User-jbond, Operations
jbond triaged T268775: CFSSL: certdb not populating AKI making revokes impossible as Medium priority.
Wed, Nov 25, 6:07 PM · User-jbond, Operations
jbond created P13415 cloud puppetmaster diff.
Wed, Nov 25, 12:58 PM
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

No Ideas however i did notice that i got a 405 earlier today before failing over idp to idp2001. idp1001 was on an older software version so its possible there could have been a config mismatch. That aside i think the next thing to try would be to call the JS code manually via the console

Wed, Nov 25, 11:16 AM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond added a comment to T268233: thanos u/i gives errors if left idle for a few hours.

im nt that familure with chromes error logging net::ERR_FAILED looks like it may be a more generic network error then specificity with CORS which should have given a clear HTTP/1.1 403 Forbidden. I think the next thing to try is to trigger the pre-flight check from manually from the JavaScript console as it seems to work fine with curl*

Wed, Nov 25, 11:14 AM · CAS-SSO, observability, Operations

Tue, Nov 24

jbond closed T268329: Request new database for pki.discovery.wmnet as Resolved.

This is working as expected thanks

Tue, Nov 24, 5:03 PM · User-jbond, Puppet, DBA, Operations
jbond committed rLPRI130bd8304570: pki: correct key name (authored by jbond).
pki: correct key name
Tue, Nov 24, 2:56 PM
jbond committed rLPRI421dbe5e3477: pki: add dummy db_pass (authored by jbond).
pki: add dummy db_pass
Tue, Nov 24, 2:38 PM
jbond added a comment to T268327: Request new database for idp.wikimedia.org.

@Marostegui I had to add a firewall rule but all looks good now, can be closed from my end. thanks

Tue, Nov 24, 12:54 PM · User-jbond, DBA, CAS-SSO, Operations
jbond added a comment to T268327: Request new database for idp.wikimedia.org.

@jbond we already have a cas user that has access to cas_staging database. Do you want to re-use that user/password or use a different one for the new casdatabase?

Tue, Nov 24, 10:19 AM · User-jbond, DBA, CAS-SSO, Operations

Fri, Nov 20

jbond added a comment to T268104: Backup failures on seaborgium, serpens (LDAP servers).

I realized I closed this ticket because the part I reported (backup failures) is fixed for me. Feel free to reopen and edit the task to add the RFO/root cause (what I usually do is edit the ticket title with a "(was: <original title>)". Up to you.

Fri, Nov 20, 3:12 PM · Data-Persistence-Backup, Operations, LDAP
jbond added a comment to T268327: Request new database for idp.wikimedia.org.

@jbond What is your preferred delivery date for this?

Fri, Nov 20, 12:35 PM · User-jbond, DBA, CAS-SSO, Operations
jbond added a comment to T268329: Request new database for pki.discovery.wmnet.

@jbond What is your preferred delivery date for this?

for this one ~1 week would be nice however it wont really become a blocker for me untill we approch the end of the Q ~4weeks

Fri, Nov 20, 12:34 PM · User-jbond, Puppet, DBA, Operations
jbond triaged T268329: Request new database for pki.discovery.wmnet as Medium priority.
Fri, Nov 20, 12:12 PM · User-jbond, Puppet, DBA, Operations
jbond triaged T268327: Request new database for idp.wikimedia.org as Medium priority.
Fri, Nov 20, 12:00 PM · User-jbond, DBA, CAS-SSO, Operations
jbond created T268327: Request new database for idp.wikimedia.org.
Fri, Nov 20, 11:59 AM · User-jbond, DBA, CAS-SSO, Operations
jbond moved T256113: CAS Store U2f tokens in a database from Unsorted 💣 to Active 🚁 on the User-jbond board.
Fri, Nov 20, 11:49 AM · Patch-For-Review, CAS-SSO, User-jbond, Operations
jbond moved T256972: Refactor mariadb puppet code from Unsorted 💣 to Watching 👀 on the User-jbond board.
Fri, Nov 20, 11:34 AM · Patch-For-Review, DBA, Operations, User-jbond, User-Kormat
jbond moved T257033: Standardize/centralize mapping from section to mariadb port/socket and prom-mysql-exporter port from Unsorted 💣 to Watching 👀 on the User-jbond board.
Fri, Nov 20, 11:34 AM · Patch-For-Review, DBA, Operations, User-jbond, User-Kormat
jbond closed T259013: puppet: drop legacy validate_ functions as Resolved.
Fri, Nov 20, 11:34 AM · Patch-For-Review, Puppet, Operations, User-jbond
jbond moved T259117: OKR: Install and configure new CFSSL PKI server from Unsorted 💣 to Active 🚁 on the User-jbond board.
Fri, Nov 20, 11:33 AM · Patch-For-Review, User-jbond, Operations
jbond moved T264888: Review default ferm INPUT policy from Unsorted 💣 to Back Burner 🏛️ on the User-jbond board.
Fri, Nov 20, 11:33 AM · Patch-For-Review, Security, Operations, netops, User-jbond
jbond moved T265138: OKR: Work required to prepare for puppet 6 from Unsorted 💣 to Active 🚁 on the User-jbond board.
Fri, Nov 20, 11:33 AM · Patch-For-Review, User-jbond, puppet-compiler, Operations, Puppet
jbond moved T265143: in puppet 6 some core types have been moved to external modules. check and confirm our exposure from Unsorted 💣 to Friday tasks on the User-jbond board.
Fri, Nov 20, 11:25 AM · User-jbond, puppet-compiler, Operations, Puppet
jbond moved T265153: Proposal: create a framework to build containerized incident management protects from Unsorted 💣 to Friday tasks on the User-jbond board.
Fri, Nov 20, 11:25 AM · User-jbond, Operations
jbond moved T265633: Allow running PCC with different states of the private repo for prod/change catalog from Unsorted 💣 to Friday tasks on the User-jbond board.
Fri, Nov 20, 11:25 AM · User-jbond, puppet-compiler
jbond moved T267395: Puppet clean up Parent task from Unsorted 💣 to Friday tasks on the User-jbond board.
Fri, Nov 20, 11:25 AM · Operations, User-jbond, Puppet
jbond moved T268217: IDP failover improvments from Unsorted 💣 to Back Burner 🏛️ on the User-jbond board.
Fri, Nov 20, 11:25 AM · User-jbond, Operations, CAS-SSO

Thu, Nov 19

jbond added a comment to T268233: thanos u/i gives errors if left idle for a few hours.

Just reproduced this in chrome, and got this message:

Access to XMLHttpRequest at 'https://idp.wikimedia.org/login?service=https%3a%2f%2fthanos.wikimedia.org%2fapi%2fv1%2fquery%3fquery%3dsum_over_time(mysql_exporter_last_scrape_error%255B5m%255D)%2520%253E%25201%26dedup%3dtrue%26partial_response%3dtrue%26time%3d1605799406.496%26_%3d1605795670420' (redirected from 'https://thanos.wikimedia.org/api/v1/query?query=sum_over_time(mysql_exporter_last_scrape_error%5B5m%5D)%20%3E%201&dedup=true&partial_response=true&time=1605799406.496&_=1605795670420') from origin 'https://thanos.wikimedia.org' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
Thu, Nov 19, 3:52 PM · CAS-SSO, observability, Operations
jbond added a comment to T268211: Filter (if possible) downtimed hosts from check_puppet_run_changes.py's report.

The last open question is how much is worth to invest time in icinga-related stuff based on the current plans for alertmanager.

I spoke with Filippo and its expected that scheduled downtimes will continue in icinga for some time as such it probably is worth fixing. I have made a quick [incomplete] PS to add this, let me know what you think if you get a sec

Thu, Nov 19, 2:46 PM · Operations
jbond added a comment to T268233: thanos u/i gives errors if left idle for a few hours.

could be related to https://phabricator.wikimedia.org/T267186

Thu, Nov 19, 2:39 PM · CAS-SSO, observability, Operations
jbond updated subscribers of T268040: Review puppetmaster SSL configueration.
Thu, Nov 19, 1:21 PM · Patch-For-Review, Operations, Puppet
jbond added a comment to T268040: Review puppetmaster SSL configueration.

In considering this task more I think one complication is an SSL certificate for CN=puppet which is copied to every server so that it can be used to provide SSL Client authentication to the https://puppet:8140/ endpoint. Note this is not the Root certificate which has a CN of Puppet CA: palladium.eqiad.wmnet.

Thu, Nov 19, 1:21 PM · Patch-For-Review, Operations, Puppet
jbond renamed T268040: Review puppetmaster SSL configueration from Investigate the existence of files under the server ssl dir foobars puppet to Review puppetmaster SSL configueration.
Thu, Nov 19, 12:54 PM · Patch-For-Review, Operations, Puppet
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

Different error message this time in the browser's console, but something has definitely changed!

I have just pushed another change which i think should fix this are you able to test again

Thu, Nov 19, 12:46 PM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond updated subscribers of T268211: Filter (if possible) downtimed hosts from check_puppet_run_changes.py's report.

I think the best way forward here would be to run the script from one of the cumin hosts. Then we could use spicerack to query the icinga downtime status of a host. however Saying that i just took a quick look at the spicerack API and i couldn't see a function to get the current downtime status only functions to add and remove dowtime. tagging @Volans in case i missed something but also to ask if its worth adding the functions from this script to spicerack. My gut feeling is no this script is a bit to specific in the way it queries the puppetdb backend but perhaps not.

Thu, Nov 19, 10:24 AM · Operations
jbond added a comment to T268217: IDP failover improvments.

however i wonder if we should instead base the idp_primary/idp_failover parameters on a DNS lookup for dig CNAME idp.iwkimeida.org @ns0.wikimedia.org so there is only one source of truth?

We should definetly avoid qurying the auth server directly, otherwise puppet could fail over the system before all DNS cache's have expired. Querying a cache also has a race condition, one we can probably game to ensure we win, although its still not ideal

Thu, Nov 19, 10:01 AM · User-jbond, Operations, CAS-SSO
jbond updated the task description for T268217: IDP failover improvments.
Thu, Nov 19, 9:45 AM · User-jbond, Operations, CAS-SSO
jbond triaged T268217: IDP failover improvments as Medium priority.
Thu, Nov 19, 9:44 AM · User-jbond, Operations, CAS-SSO
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

This has now been deployed to production and at the pre-flight test seems to be working. Are you able to test if the underlining issue is resolved

Thu, Nov 19, 9:35 AM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability

Wed, Nov 18

jbond closed T267396: Replace os_version with debian::codename as Resolved.
Wed, Nov 18, 5:40 PM · Patch-For-Review, Operations, User-jbond, Puppet
jbond closed T267396: Replace os_version with debian::codename, a subtask of T267395: Puppet clean up Parent task, as Resolved.
Wed, Nov 18, 5:40 PM · Operations, User-jbond, Puppet
jbond added a comment to T256454: Missing dependency on bacula-fd Puppet setup.

Answering the question I made on the previous patch, my suspicion is that it should avoid future issues like T268104 after deploy, but I think there is more not-explicit dependencies on install.

Wed, Nov 18, 4:10 PM · Data-Persistence-Backup, Operations, Puppet
jbond added a comment to T268104: Backup failures on seaborgium, serpens (LDAP servers).

I also noted the following warning on seaborgium, serpens AND sretest1002. Related?

WARNING: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
Wed, Nov 18, 10:17 AM · Data-Persistence-Backup, Operations, LDAP
jbond created P13328 (An Untitled Masterwork).
Wed, Nov 18, 10:14 AM

Tue, Nov 17

jbond added a comment to T268040: Review puppetmaster SSL configueration.

I have done some initial testing an i think we should just drop the ssl config from the puppetmaster backend servers and let it use the default. the backends dont ,ever do CA operations so they dont need to worry about that and the frontends have the CA dir rsynced

Tue, Nov 17, 5:57 PM · Patch-For-Review, Operations, Puppet
jbond added a comment to T268040: Review puppetmaster SSL configueration.

checking the following one the backends shows that the keys are all different which points to the puppet master process generating theses keys when it first needs them as appose to reciving them from the configuered puppet master

Tue, Nov 17, 5:21 PM · Patch-For-Review, Operations, Puppet
jbond added a comment to T268040: Review puppetmaster SSL configueration.

Tagging https://gerrit.wikimedia.org/r/c/operations/puppet/+/386666 as although its slightly different it seems to be around the same bit of code

Tue, Nov 17, 5:11 PM · Patch-For-Review, Operations, Puppet
jbond triaged T268040: Review puppetmaster SSL configueration as Medium priority.
Tue, Nov 17, 4:48 PM · Patch-For-Review, Operations, Puppet
jbond created P13303 (An Untitled Masterwork).
Tue, Nov 17, 4:31 PM
jbond created P13302 (An Untitled Masterwork).
Tue, Nov 17, 3:50 PM
jbond added a comment to T236373: puppet master command will be removed in puppet 6.

https://github.com/github/octocatalog-diff/pull/226

Tue, Nov 17, 9:32 AM · User-jbond, Operations, puppet-compiler, Puppet

Mon, Nov 16

jbond committed rLPRIce15d104fae7: Add truststore file (authored by jbond).
Add truststore file
Mon, Nov 16, 3:49 PM
jbond updated subscribers of T267006: Puppet failures on many hosts.

in relation to deployment-cache-upload06 puppet runs successfully but the following command fails on each execution @Vgutierrez may be able to quickly spot the issue

Mon, Nov 16, 3:09 PM · Beta-Cluster-Infrastructure
jbond updated the task description for T267006: Puppet failures on many hosts.
Mon, Nov 16, 3:03 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

deployment-wdqs01 should be fixed with https://gerrit.wikimedia.org/r/641171 however there is now a conflict wit a local commit which needs deleting:
c503964991 [LOCAL HACK] Fill placeholder for etcd::autogen_pwd_seed from root@deployment-puppetmaster04:/var/lib/git/operations/puppet
`

This has been corrected also needed a few extra yaml settings

Mon, Nov 16, 3:03 PM · Beta-Cluster-Infrastructure
jbond updated the task description for T267006: Puppet failures on many hosts.
Mon, Nov 16, 2:50 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

to fix deployment-logstash03:

  • rename role::logstash::apifeatureusage to profile::logstash::apifeatureusage in horizon
  • rename role::logstash::collector to profile::logstash::collector in horizon
  • add modules/secret/secrets/certificates/kafka_logstash-eqiad_broker/truststore.jks to private repo
  • add the following to the kafka_cluters global hiera value in the deployment-prep puppet project config
Mon, Nov 16, 2:50 PM · Beta-Cluster-Infrastructure
jbond updated the task description for T267006: Puppet failures on many hosts.
Mon, Nov 16, 2:33 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

deployment-wdqs01 should be fixed with https://gerrit.wikimedia.org/r/641171 however there is now a conflict wit a local commit which needs deleting:
c503964991 [LOCAL HACK] Fill placeholder for etcd::autogen_pwd_seed from root@deployment-puppetmaster04:/var/lib/git/operations/puppet
`

Mon, Nov 16, 2:32 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

deployment-mdb01 fixed

Mon, Nov 16, 2:02 PM · Beta-Cluster-Infrastructure
jbond updated the task description for T267006: Puppet failures on many hosts.
Mon, Nov 16, 1:54 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

deployment-kafka-* servers where all working fine when tested

Mon, Nov 16, 1:54 PM · Beta-Cluster-Infrastructure
jbond updated the task description for T267006: Puppet failures on many hosts.
Mon, Nov 16, 1:53 PM · Beta-Cluster-Infrastructure
jbond added a comment to T267006: Puppet failures on many hosts.

Adding the following to match production has fixed deployment-ores01

Mon, Nov 16, 1:46 PM · Beta-Cluster-Infrastructure
jbond closed T267831: profile::mail::mx fails Puppet on deployment-mx02 instance due to lack of otrs settings as Resolved.

I have added the following to the puppet config in horizon, this equates to the config which is in the current exim.conf file

Mon, Nov 16, 1:37 PM · OTRS, Beta-Cluster-Infrastructure
jbond closed T267831: profile::mail::mx fails Puppet on deployment-mx02 instance due to lack of otrs settings, a subtask of T267006: Puppet failures on many hosts, as Resolved.
Mon, Nov 16, 1:37 PM · Beta-Cluster-Infrastructure
jbond triaged T267186: alerts.w.o / idp.w.o interaction and CORS as Medium priority.
Mon, Nov 16, 12:52 PM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability
jbond added a comment to T267186: alerts.w.o / idp.w.o interaction and CORS.

the pre-flight check is now working on idp-test, ill now enable on production

Mon, Nov 16, 12:38 PM · CAS-SSO, Patch-For-Review, User-fgiunchedi, observability

Tue, Nov 10

jbond added a comment to T267439: MediaWiki beta varnish is down..

Noticed that puppet stopped running on beta's deployment-cache-text06 due to c871b021a44108bed44cd044f5177b075ecb322c

After attempting a fix (live hack on deployment-puppetmaster04 changing:

modules/confd/manifests/init.pp
-    Stdlib::Fqdn     $srv_dns       = $facts['domain'],
+    String           $srv_dns       = $::domain,

I'm curious if this change made any difference? as i dont think it should (but there could be something with horizons backed which im not familiar with)

Tue, Nov 10, 6:01 PM · User-Ryasmeen, Release-Engineering-Team, Beta-Cluster-Infrastructure