Page MenuHomePhabricator

tools-acme-chief-01 is attempting to validate DNS challenge against cloud authdns IPv6 addresses
Open, MediumPublic

Description

I noticed strange behaviour while running puppet on a tools acme-chief client and noticed the logs on tools-acme-chief-01 show this:

Feb 23 02:18:21 tools-acme-chief-01 acme-chief-backend[23899]: Handling pushed CSR event for toolforge / rsa-2048
Feb 23 02:18:25 tools-acme-chief-01 acme-chief-backend[23899]: DNS server 2620:0:861:2:208:80:154:135 (ACMEChallengeValidation.UNKNOWN) failed to validate challenge Challenge type: ACMEChallengeType.DNS01. _acme-challenge.toolforge.org TXT ILmKGKtTIpwadEcimFmGXsb-Eb9JJAx9yV-7nY4Kllo
Feb 23 02:18:25 tools-acme-chief-01 acme-chief-backend[23899]: DNS server 2620:0:861:1:208:80:154:11 (ACMEChallengeValidation.UNKNOWN) failed to validate challenge Challenge type: ACMEChallengeType.DNS01. _acme-challenge.toolforge.org TXT ILmKGKtTIpwadEcimFmGXsb-Eb9JJAx9yV-7nY4Kllo
Feb 23 02:18:25 tools-acme-chief-01 acme-chief-backend[23899]: Unable to validate challenge Challenge type: ACMEChallengeType.DNS01. _acme-challenge.toolforge.org TXT ILmKGKtTIpwadEcimFmGXsb-Eb9JJAx9yV-7nY4Kllo

Any kind of IPv6 traffic from within labs would obviously fail, so why is it trying?
/etc/acme-chief/config.yaml contains:

challenges:
  dns-01:
    issuing_ca: letsencrypt.org
    ns_records:
    - cloud-ns0.wikimedia.org.
    - cloud-ns1.wikimedia.org.
    resolver_port: 53
    sync_dns_servers:
    - cloud-ns0.wikimedia.org
    - cloud-ns1.wikimedia.org
    validation_dns_servers:
    - cloud-ns0.wikimedia.org
    - cloud-ns1.wikimedia.org
    zone_update_cmd: "/usr/local/bin/acme-chief-designate-sync.py"

And with our puppetisation there's no way of putting IPv4 addresses in there.

>>> from acme_chief.dns import Resolver
>>> Resolver.resolve_dns_servers(['cloud-ns0.wikimedia.org', 'cloud-ns1.wikimedia.org'])
['208.80.154.135', '2620:0:861:2:208:80:154:135', '208.80.154.11', '2620:0:861:1:208:80:154:11']

Resolver.resolve_dns_servers calls socket.getaddrinfo but does not specify family and ignores the family in the results.

Important to get this fixed or worked around by 19th March as that's the expiry date of the live toolforge cert this instance manages.

Event Timeline

Krenair created this task.Feb 23 2020, 2:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 23 2020, 2:30 AM
Krenair renamed this task from tools-acme-chief-01 is attempting to validate DNS challenge against authdns IPv6 addresses to tools-acme-chief-01 is attempting to validate DNS challenge against cloud authdns IPv6 addresses.Feb 23 2020, 2:30 AM

I'd be interested to know why this has not been a problem before by the way - those cloud-ns hosts have had AAAA records since creation AFAIK

Change 574221 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/software/acme-chief@master] Allow configuration of AddressFamily used for DNS validation

https://gerrit.wikimedia.org/r/574221

for the traffic cloud instances I bypassed this issue with this ugly hack:

authdns_servers:
  208.80.154.11: 208.80.154.11
  208.80.154.135: 208.80.154.135

for the traffic cloud instances I bypassed this issue with this ugly hack:

authdns_servers:
  208.80.154.11: 208.80.154.11
  208.80.154.135: 208.80.154.135

I assumed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/554292/5/modules/profile/manifests/acme_chief.pp requiring Stdlib::Fqdn this would not work. Interesting. New cert has successfully been issued now.

Andrew triaged this task as Medium priority.Mar 10 2020, 4:07 PM
Andrew added a subscriber: Andrew.Jun 2 2020, 4:45 PM

@Krenair, can you summarize the results here? It looks resolved but it's not clear if or how :)

@Krenair, can you summarize the results here? It looks resolved but it's not clear if or how :)

worked around with an ugly hiera hack :)