Since around 2023-11-16 a number of units are failed on thanos-fe1001 ( swift_dispersion_stats.service,swift_dispersion_stats_lowlatency.service,swift_ring_manager.service ). The commonality is that they all end up calling swift-dispersion-report under the hood. In practice, none of the thanos front-ends can now run that command.
That fails because of a TLS error when trying to start up an internal client connect:
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>
A bit of stracery told me this was after attempting to connect to 10.2.2.54:443 i.e. thanos-swift.svc.eqiad.wmnet.
I can reproduce this on the CLI:
mvernon@thanos-fe1001:~$ openssl s_client -connect 10.2.2.54:443 -showcerts </dev/null 2>/dev/null CONNECTED(00000003) --- Certificate chain 0 s:CN = thanos-fe-combined.discovery.wmnet i:CN = Puppet CA: palladium.eqiad.wmnet -----BEGIN CERTIFICATE----- MIIExDCCAqygAwIBAgICIwwwDQYJKoZIhvcNAQELBQAwKzEpMCcGA1UEAwwgUHVw cGV0IENBOiBwYWxsYWRpdW0uZXFpYWQud21uZXQwHhcNMjIwMTMxMTI0MDE2WhcN MjcwMTMxMTI0MDE2WjAtMSswKQYDVQQDDCJ0aGFub3MtZmUtY29tYmluZWQuZGlz Y292ZXJ5LndtbmV0MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEc995xll27AS/ 0Pp3yr3LBNDcpeGjNVnKj6d0C4ECKUF5B/G2SspiHmmSht2+hvlqjXN3gDTlBPUT D7H5oDViA6OCAbkwggG1MDcGCWCGSAGG+EIBDQQqDChQdXBwZXQgUnVieS9PcGVu U1NMIEludGVybmFsIENlcnRpZmljYXRlMIH5BgNVHREEgfEwge6CHHRoYW5vcy1x dWVyeS5zdmMuZXFpYWQud21uZXSCHHRoYW5vcy1zd2lmdC5kaXNjb3Zlcnkud21u ZXSCFHRoYW5vcy53aWtpbWVkaWEub3Jnghx0aGFub3Mtc3dpZnQuc3ZjLmNvZGZ3 LndtbmV0ghx0aGFub3MtcXVlcnkuZGlzY292ZXJ5LndtbmV0ghx0aGFub3MtcXVl cnkuc3ZjLmNvZGZ3LndtbmV0ghx0aGFub3Mtc3dpZnQuc3ZjLmVxaWFkLndtbmV0 giJ0aGFub3MtZmUtY29tYmluZWQuZGlzY292ZXJ5LndtbmV0MAwGA1UdEwEB/wQC MAAwHQYDVR0OBBYEFL6uRHRZCeIdIwLxbTLm30QtEnG+MB8GA1UdIwQYMBaAFFnk hjB+Aq8NAKZ07Zr2DheubK66MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAU BggrBgEFBQcDAQYIKwYBBQUHAwIwDQYJKoZIhvcNAQELBQADggIBADuEHZUy3fhw J2kYuJY3Rz59EpErd2ePna9fjwfCO2uc2yUDM+yYvYRfMCU6efyWNwHn6PIeszjd Ax1kRTERTLtepieRj8l3kB3QOFU2wU1H0XldElUZ0UnoRCDEAb3dT9jUHh85LuFi wZEDo9EUd52Vza9kuPNV3tl/syGV3Dr6NLqQQ3buqsjJSp+p9VHyorjkzWkshMWj xdT4fZ0EZJ8m50SjKCQT2mzQU8i0gwNEGI0PyfW6od06gvKfnmHfJoXSoWqwXpLj PlA1FxH816dUiB2jZ3lq0paJL3gtm6IWO1K+8rH2QR4rFl24/PaDntXZ4tOborkC Fa6fIn/+R/PosPMcglLSN3TVehfgwg8fqb7KgtHIl8Y5KE6MgXYjmlgr2RfhH6wG Q+nPFtJCNIOwaBqz+htwSV8J6ejzgoDOCEXgwf5nQL1RZFjs/eb3vvzOf/7BK5f+ PfD9SQeOByfnsPu2qWzs+5pdujMxhdSWwwZTAEVdeiSMxPmcaLh+hCw/mUR9Fuvj McbKx+pbLnuw9AMaAOS1gnFoG6IYTkOqRpRzXx++ywqSDWpiDh/QNjzWjk3a150h bI+D8jrBQgyX1c9VxJ9fYBL8DRalkd1U74dk6B+Xubi89vshQ1rn5ucVVk4utqIP Dr8lf5rcHbSBwUnMEHAkV2VOaT7jia3D -----END CERTIFICATE----- --- Server certificate subject=CN = thanos-fe-combined.discovery.wmnet issuer=CN = Puppet CA: palladium.eqiad.wmnet --- No client certificate CA names sent Peer signing digest: SHA256 Peer signature type: ECDSA Server Temp Key: X25519, 253 bits --- SSL handshake has read 1528 bytes and written 363 bytes Verification error: unable to verify the first certificate --- New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384 Server public key is 256 bit Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 21 (unable to verify the first certificate) ---
If I run the same command on cumin1001 (puppet 5), it works:
mvernon@cumin1001:~$ openssl s_client -connect 10.2.2.54:443 -showcerts </dev/null 2>/dev/null CONNECTED(00000003) --- Certificate chain 0 s:CN = thanos-fe-combined.discovery.wmnet i:CN = Puppet CA: palladium.eqiad.wmnet -----BEGIN CERTIFICATE----- MIIExDCCAqygAwIBAgICIwwwDQYJKoZIhvcNAQELBQAwKzEpMCcGA1UEAwwgUHVw cGV0IENBOiBwYWxsYWRpdW0uZXFpYWQud21uZXQwHhcNMjIwMTMxMTI0MDE2WhcN MjcwMTMxMTI0MDE2WjAtMSswKQYDVQQDDCJ0aGFub3MtZmUtY29tYmluZWQuZGlz Y292ZXJ5LndtbmV0MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEc995xll27AS/ 0Pp3yr3LBNDcpeGjNVnKj6d0C4ECKUF5B/G2SspiHmmSht2+hvlqjXN3gDTlBPUT D7H5oDViA6OCAbkwggG1MDcGCWCGSAGG+EIBDQQqDChQdXBwZXQgUnVieS9PcGVu U1NMIEludGVybmFsIENlcnRpZmljYXRlMIH5BgNVHREEgfEwge6CHHRoYW5vcy1x dWVyeS5zdmMuZXFpYWQud21uZXSCHHRoYW5vcy1zd2lmdC5kaXNjb3Zlcnkud21u ZXSCFHRoYW5vcy53aWtpbWVkaWEub3Jnghx0aGFub3Mtc3dpZnQuc3ZjLmNvZGZ3 LndtbmV0ghx0aGFub3MtcXVlcnkuZGlzY292ZXJ5LndtbmV0ghx0aGFub3MtcXVl cnkuc3ZjLmNvZGZ3LndtbmV0ghx0aGFub3Mtc3dpZnQuc3ZjLmVxaWFkLndtbmV0 giJ0aGFub3MtZmUtY29tYmluZWQuZGlzY292ZXJ5LndtbmV0MAwGA1UdEwEB/wQC MAAwHQYDVR0OBBYEFL6uRHRZCeIdIwLxbTLm30QtEnG+MB8GA1UdIwQYMBaAFFnk hjB+Aq8NAKZ07Zr2DheubK66MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAU BggrBgEFBQcDAQYIKwYBBQUHAwIwDQYJKoZIhvcNAQELBQADggIBADuEHZUy3fhw J2kYuJY3Rz59EpErd2ePna9fjwfCO2uc2yUDM+yYvYRfMCU6efyWNwHn6PIeszjd Ax1kRTERTLtepieRj8l3kB3QOFU2wU1H0XldElUZ0UnoRCDEAb3dT9jUHh85LuFi wZEDo9EUd52Vza9kuPNV3tl/syGV3Dr6NLqQQ3buqsjJSp+p9VHyorjkzWkshMWj xdT4fZ0EZJ8m50SjKCQT2mzQU8i0gwNEGI0PyfW6od06gvKfnmHfJoXSoWqwXpLj PlA1FxH816dUiB2jZ3lq0paJL3gtm6IWO1K+8rH2QR4rFl24/PaDntXZ4tOborkC Fa6fIn/+R/PosPMcglLSN3TVehfgwg8fqb7KgtHIl8Y5KE6MgXYjmlgr2RfhH6wG Q+nPFtJCNIOwaBqz+htwSV8J6ejzgoDOCEXgwf5nQL1RZFjs/eb3vvzOf/7BK5f+ PfD9SQeOByfnsPu2qWzs+5pdujMxhdSWwwZTAEVdeiSMxPmcaLh+hCw/mUR9Fuvj McbKx+pbLnuw9AMaAOS1gnFoG6IYTkOqRpRzXx++ywqSDWpiDh/QNjzWjk3a150h bI+D8jrBQgyX1c9VxJ9fYBL8DRalkd1U74dk6B+Xubi89vshQ1rn5ucVVk4utqIP Dr8lf5rcHbSBwUnMEHAkV2VOaT7jia3D -----END CERTIFICATE----- --- Server certificate subject=CN = thanos-fe-combined.discovery.wmnet issuer=CN = Puppet CA: palladium.eqiad.wmnet --- No client certificate CA names sent Peer signing digest: SHA256 Peer signature type: ECDSA Server Temp Key: X25519, 253 bits --- SSL handshake has read 1529 bytes and written 363 bytes Verification: OK --- New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384 Server public key is 256 bit Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 0 (ok) ---
If I run the same command on cumin2002 (a puppet 7 host), I see the same error as on the thanos frontends; that combined with the timing leads me to conclude this is a puppet 7 issue...