As the title says. We have a lot of the easy parts of this already done, it's just a matter of correctly configuring an haproxy instance to accept TLSv1.3 and backend raw TCP conns with PROXYv2 metadata into the gdnsd port that's already waiting on such traffic, and setting up an LE cert with SANs matching our official nameserver hostnames. This will technically be "opportunistic" profile at that point, but the fact that the certs will validate in the usual browser sense against our fixed set of NS-record hostnames goes a long way as well. We can tackle better profiles support at a later date.
Description
Details
Related Objects
Event Timeline
Change 556738 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: define acme cert
Change 556739 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] [WIP] dotls: main implementation
Change 556738 merged by BBlack:
[operations/puppet@production] dotls: define acme cert
Change 556809 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: test on dns4002
Change 556739 merged by BBlack:
[operations/puppet@production] dotls: main implementation
Change 556809 merged by BBlack:
[operations/puppet@production] dotls: test on dns4002
Change 556814 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: fix listen specs
Change 556814 merged by BBlack:
[operations/puppet@production] dotls: fix listen specs
Change 556821 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: simpler and clearer listen config
Change 556821 merged by BBlack:
[operations/puppet@production] dotls: simpler and clearer listen config
Change 556827 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: add ferm and NRPE monitoring via kdig
Change 556827 merged by BBlack:
[operations/puppet@production] dotls: add ferm and NRPE monitoring via kdig
Change 556831 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: haproxy gdnsd dep and smooth reloads
Change 556831 merged by BBlack:
[operations/puppet@production] dotls: haproxy gdnsd dep and smooth reloads
Change 556833 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: glue haproxy to gdnsd in systemd
Change 556833 merged by BBlack:
[operations/puppet@production] dotls: glue haproxy to gdnsd in systemd
This is now mostly-working, with heira flag controlling test deployment (currently only on dns4002, which doesn't have any public authserver IPs routed into it at this time).
Reminders on the next bits to remember to work on (besides just pushing it to the rest of the fleet):
- Global monitoring (icinga hitting the official public IPs, which means this is just a singular-POV monitor rather than all-machines, but as with existing authdns global checks, better-than-nothing).
- TLS Perf Tuning, especially a secure, shared ticket key rotation system (may as well make a generic one, as we'll likely want a similar one for ats-tls, espectially with TLSv1.3 just around the corner there as well).
Refactoring the dependencies a little here: Really (2) above's sub-point about shared ticket key rotation won't matter until we're anycasting, so I've made a separate task (+subtask) in T240863 to go look at that stuff later, blocking the anycast work.
What's left here really, given testing has gone great so far, is some relatively-minor config tweaks and the global monitoring.
Change 558234 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: ssl tweaks
Change 558235 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: enable on all servers
Change 558235 merged by BBlack:
[operations/puppet@production] dotls: enable on all servers
Actually we can't realistically do global monitoring from icinga either, because icinga isn't on Buster and so it doesn't have the right library/tool access to check a TLSv1.3-only service, so we'll have to settle for the per-server NRPE checks for now.
External queries now working (note they all return a codfw IP without edns-client-subnet in play, because codfw is closest to my laptop and PROXYv2 is working for sending the "real" client IP from haproxy to gdnsd).
bblack@haliax:~$ kdig +nsid +tls-ca @ns0.wikimedia.org wikipedia.org A ;; TLS session (TLS1.3)-(ECDHE-SECP256R1)-(ECDSA-SECP256R1-SHA256)-(CHACHA20-POLY1305) ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 60466 ;; Flags: qr aa rd; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1 ;; EDNS PSEUDOSECTION: ;; Version: 0; flags: ; UDP size: 1024 B; ext-rcode: NOERROR ;; Option (11): 0172 ;; NSID: 61757468646E7331303031 "authdns1001" ;; PADDING: 385 B ;; QUESTION SECTION: ;; wikipedia.org. IN A ;; ANSWER SECTION: wikipedia.org. 600 IN A 208.80.153.224 ;; Received 468 B ;; Time 2019-12-16 23:34:58 UTC ;; From 208.80.154.238@853(TCP) in 151.4 ms bblack@haliax:~$ kdig +nsid +tls-ca @ns1.wikimedia.org wikipedia.org A ;; TLS session (TLS1.3)-(ECDHE-SECP256R1)-(ECDSA-SECP256R1-SHA256)-(CHACHA20-POLY1305) ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 53240 ;; Flags: qr aa rd; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1 ;; EDNS PSEUDOSECTION: ;; Version: 0; flags: ; UDP size: 1024 B; ext-rcode: NOERROR ;; Option (11): 0172 ;; NSID: 61757468646E7332303031 "authdns2001" ;; PADDING: 385 B ;; QUESTION SECTION: ;; wikipedia.org. IN A ;; ANSWER SECTION: wikipedia.org. 600 IN A 208.80.153.224 ;; Received 468 B ;; Time 2019-12-16 23:35:01 UTC ;; From 208.80.153.231@853(TCP) in 97.7 ms bblack@haliax:~$ kdig +nsid +tls-ca @ns2.wikimedia.org wikipedia.org A ;; TLS session (TLS1.3)-(ECDHE-SECP256R1)-(ECDSA-SECP256R1-SHA256)-(CHACHA20-POLY1305) ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 17804 ;; Flags: qr aa rd; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1 ;; EDNS PSEUDOSECTION: ;; Version: 0; flags: ; UDP size: 1024 B; ext-rcode: NOERROR ;; Option (11): 0172 ;; NSID: 646E7333303031 "dns3001" ;; PADDING: 389 B ;; QUESTION SECTION: ;; wikipedia.org. IN A ;; ANSWER SECTION: wikipedia.org. 600 IN A 208.80.153.224 ;; Received 468 B ;; Time 2019-12-16 23:35:17 UTC ;; From 91.198.174.239@853(TCP) in 379.8 ms
Change 558522 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dotls: use haproxy exporter profile
Change 558522 merged by BBlack:
[operations/puppet@production] dotls: use haproxy exporter profile