From Andrew: the production salt master (neodymium) has been rebuilt on Jessie.
We hence want to rebuild deployment-salt.
From Andrew: the production salt master (neodymium) has been rebuilt on Jessie.
We hence want to rebuild deployment-salt.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Krenair | T136411 Rebuild beta cluster saltmaster on Jessie | |||
Resolved | Krenair | T136077 deployment-cache-upload04/text04 Could not find data item cache::cluster in hiera no default supplied at /etc/puppet/modules/role/manifests/cache/base.pp:17 | |||
Duplicate | None | T136080 Hiera hierarchy hieradata/role/* is not applied on labs (eg deployment-prep) | |||
Declined | None | T120165 Implement role based hiera lookups for labs |
Instances should be moving over to the new saltmaster as puppet runs across the cluster.
These ones are stuck, mostly due to puppet failures:
krenair@deployment-salt:~$ sudo salt '*' cmd.run echo deployment-puppetmaster.deployment-prep.eqiad.wmflabs: deployment-aqs01.deployment-prep.eqiad.wmflabs: deployment-cache-upload04.deployment-prep.eqiad.wmflabs: deployment-cache-text04.deployment-prep.eqiad.wmflabs:
deployment-puppetmaster seems to be responding to both salt masters?!?
I killed an old salt-minion process on -puppetmaster and that appears to have fixed the weirdness there. The others have these puppet errors:
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item cache::cluster in any Hiera data file and no default supplied at /etc/puppet/modules/role/manifests/cache/base.pp:17 on node deployment-cache-text04.deployment-prep.eqiad.wmflabs Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item cache::cluster in any Hiera data file and no default supplied at /etc/puppet/modules/role/manifests/cache/base.pp:17 on node deployment-cache-upload04.deployment-prep.eqiad.wmflabs Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item aqs_hosts in any Hiera data file and no default supplied at /etc/puppet/manifests/role/aqs.pp:59 on node deployment-aqs01.deployment-prep.eqiad.wmflabs
And deployment-tin has unhappy puppet as well, likely trebuchet/salt related:
Error: /Stage[main]/Deployment::Deployment_server/Exec[eventual_consistency_deployment_server_init]/returns: change from notrun to 0 failed: salt-call deploy.deployment_server_init returned 255 instead of one of [0]
I was futzing on deployment-tin today and noticed this issue. From the looks of it, deployment-salt02 is just missing a few roles:
thcipriani@deployment-puppetmaster:~$ ldapsearch -LLL -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w $(grep -Po "(?<=bindpw).*" /etc/ldap.conf) -b 'ou=hosts,dc=wikimedia,dc=org' -z 1 "associatedDomain=deployment-salt02.eqiad.wmflabs" dn: dc=deployment-salt02.deployment-prep.eqiad.wmflabs,ou=hosts,dc=wikimedia,d c=org aRecord: 10.68.17.58 objectClass: domainRelatedObject objectClass: dNSDomain objectClass: puppetClient objectClass: domain objectClass: dcObject objectClass: top associatedDomain: deployment-salt02.deployment-prep.eqiad.wmflabs associatedDomain: deployment-salt02.eqiad.wmflabs l: eqiad dc: deployment-salt02.deployment-prep.eqiad.wmflabs puppetVar: instanceproject=deployment-prep puppetVar: instancename=deployment-salt02
vs old deployment-salt
thcipriani@deployment-puppetmaster:~$ ldapsearch -LLL -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w $(grep -Po "(?<=bindpw).*" /etc/ldap.conf) -b 'ou=hosts,dc=wikime dia,dc=org' -z 1 "associatedDomain=deployment-salt.eqiad.wmflabs" dn: dc=deployment-salt.deployment-prep.eqiad.wmflabs,ou=hosts,dc=wikimedia,dc= org objectClass: domainrelatedobject objectClass: dnsdomain objectClass: domain objectClass: puppetclient objectClass: dcobject objectClass: top l: eqiad associatedDomain: i-0000015c.eqiad.wmflabs associatedDomain: deployment-salt.eqiad.wmflabs associatedDomain: i-0000015c.deployment-prep.eqiad.wmflabs associatedDomain: deployment-salt.deployment-prep.eqiad.wmflabs dc: deployment-salt.deployment-prep.eqiad.wmflabs aRecord: 10.68.16.99 puppetClass: beta::saltmaster::tools puppetClass: role::deployment::salt_masters puppetClass: role::labs::lvm::srv puppetClass: role::salt::masters::labs::project_master puppetVar: deployment_server_override=deployment-bastion.eqiad.wmflabs puppetVar: instancename=deployment-salt puppetVar: instanceproject=deployment-prep puppetVar: salt_master_finger_override=dd:d8:68:70:8c:65:a3:af:46:5c:3f:4f:d4: be:6c:71 puppetVar: salt_master_override=deployment-salt.eqiad.wmflabs
As a result, the deploy.py salt-module isn't on the new machine:
thcipriani@deployment-salt02:/srv/salt/_modules$ ls -l total 0
This is probably what's causing the puppet error on deployment-tin.
I was adding classes to the new host through hiera instead of LDAP, but missed role::deployment::salt_masters. I don't know about role::labs::lvm::srv...
Looking at deployment-salt I suspect the extra space provided by role::labs::lvm::srv is unnecessary. Shall we shut down the old host and close this?
Sounds good to me :)
Thanks for the continued work on deployment-prep. Very much appreciated by all of Release-Engineering-Team
I've shut down deployment-salt. It still exists but I'll delete it at a later date. At that point salt will stop working entirely for those hosts which have broken puppet (see "blocked by" task).
Mentioned in SAL [2016-06-01T10:14:16Z] <hashar> beta: salt-key -d deployment-salt.deployment-prep.eqiad.wmflabs T136411
Mentioned in SAL [2016-06-01T10:29:42Z] <hashar> Upgraded Linux kernel on deployment-salt02 T136411
I compared the list of instances connected to the new server against nova list on silver, everything looks correct so I've deleted deployment-salt.