Page MenuHomePhabricator

promethium.wikitextexp.eqiad.wmflabs (10.68.16.2, labs baremetal host) has strange DNS A record result, and missing PTR
Closed, InvalidPublic

Description

1krenair@bastion-01:~$ host promethium.wikitextexp.eqiad.wmflabs
2promethium.wikitextexp.eqiad.wmflabs has address 10.68.16.2
3Host promethium.wikitextexp.eqiad.wmflabs not found: 3(NXDOMAIN)
4Host promethium.wikitextexp.eqiad.wmflabs not found: 3(NXDOMAIN)
5krenair@bastion-01:~$ host promethium.eqiad.wmflabs
6promethium.eqiad.wmflabs has address 10.68.16.2
7Host promethium.eqiad.wmflabs not found: 3(NXDOMAIN)
8Host promethium.eqiad.wmflabs not found: 3(NXDOMAIN)
9krenair@bastion-01:~$ host 10.68.16.2
10Host 2.16.68.10.in-addr.arpa. not found: 3(NXDOMAIN)

I'm not quite sure what the host command is up to there as dig just returns what you'd expect for the A records (if you make it host -v promethium.wikitextexp.eqiad.wmflabs it clearly does the same lookup three times but returns two distinct results from either three or five responses). PTR is missing either way.
(This might be known or unimportant or something but making a task to ensure we have a record)

Related Objects

Event Timeline

Change 299501 had a related patch set uploaded (by Alex Monk):
labs dnsrecursor metaldns: Resolve PTR records too

https://gerrit.wikimedia.org/r/299501

Change 299501 merged by Andrew Bogott:
labs dnsrecursor metaldns: Resolve PTR records too

https://gerrit.wikimedia.org/r/299501

That commit sorted out the PTR problem, but we still have the strange A record result:

krenair@bastion-01:~$ host promethium.wikitextexp.eqiad.wmflabs
promethium.wikitextexp.eqiad.wmflabs has address 10.68.16.2
Host promethium.wikitextexp.eqiad.wmflabs not found: 3(NXDOMAIN)
Host promethium.wikitextexp.eqiad.wmflabs not found: 3(NXDOMAIN)
krenair@bastion-01:~$ host 10.68.16.2
2.16.68.10.in-addr.arpa domain name pointer promethium.wikitextexp.eqiad.wmflabs.
krenair@bastion-01:~$

@BBlack figured out the extra requests are AAAA queries (no thanks to /usr/bin/host - seriously shouldn't that sort of info be shown in verbose mode?), which metaldns will currently return NXDOMAIN for :/

Change 299903 had a related patch set uploaded (by Alex Monk):
labs dnsrecursor metaldns: Don't return NXDOMAIN when we don't have a record of the right type but do recognise the domain

https://gerrit.wikimedia.org/r/299903

Change 299903 merged by Rush:
labs dnsrecursor metaldns: Don't return NXDOMAIN when we don't have a record of the right type but do recognise the domain

https://gerrit.wikimedia.org/r/299903

Much better, though not perfect:

; <<>> DiG 9.9.5-8-Debian <<>> promethium.wikitextexp.eqiad.wmflabs AAAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64793
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;promethium.wikitextexp.eqiad.wmflabs. IN AAAA

;; ANSWER SECTION:
promethium.wikitextexp.eqiad.wmflabs. 60 IN SOA	labs-ns0.wikimedia.org. root.wmflabs.org. 1 3600 600 86400 3600

;; Query time: 1 msec
;; SERVER: 208.80.155.118#53(208.80.155.118)
;; WHEN: Tue Aug 09 16:05:44 UTC 2016
;; MSG SIZE  rcvd: 125

The SOA record should be in the authority section, not the answer section. The difference causes this:

krenair@bastion-01:~$ host promethium.wikitextexp.eqiad.wmflabs
promethium.wikitextexp.eqiad.wmflabs has address 10.68.16.2
promethium.wikitextexp.eqiad.wmflabs has SOA record labs-ns0.wikimedia.org. root.wmflabs.org. 1 3600 600 86400 3600
promethium.wikitextexp.eqiad.wmflabs has SOA record labs-ns0.wikimedia.org. root.wmflabs.org. 1 3600 600 86400 3600

Change 303833 had a related patch set uploaded (by Alex Monk):
labs dnsrecursor metaldns: Change hook to ensure SOA records get passed properly but with NOERROR instead of NXDOMAIN

https://gerrit.wikimedia.org/r/303833

Change 303833 merged by BBlack:
labs dnsrecursor metaldns: Change hook to ensure SOA records get passed properly but with NOERROR instead of NXDOMAIN

https://gerrit.wikimedia.org/r/303833

Thanks for your help and patience everybody, especially Brandon. I believe this is now working as expected:

krenair@bastion-01:~$ host promethium.wikitextexp.eqiad.wmflabs
promethium.wikitextexp.eqiad.wmflabs has address 10.68.16.2
krenair@bastion-01:~$ host promethium.eqiad.wmflabs
promethium.eqiad.wmflabs has address 10.68.16.2
krenair@bastion-01:~$ host 10.68.16.2
2.16.68.10.in-addr.arpa domain name pointer promethium.wikitextexp.eqiad.wmflabs.

Sigh. It had to be reverted because my patch ran into a pretty nasty gotcha. As I just wrote on Gerrit (https://gerrit.wikimedia.org/r/#/c/304049/1):

Although we set up multiple files with hook functions inside them, PowerDNS only supports one. So we have a recursorhooks.lua script to make it run both. That was fine until I declared "function postresolve" in metaldns, which was already used in labs-ip-alias.lua - since metaldns is listed after labs-ip-alias in recursorhooks, it took priority and killed our labs-ip-alias code

Change 304146 had a related patch set uploaded (by Alex Monk):
[WIP] dnsrecursor: Rewrite code setting up lua hooks

https://gerrit.wikimedia.org/r/304146

Change 304146 abandoned by Alex Monk:
[WIP] dnsrecursor: Rewrite code setting up lua hooks

Reason:
never got finished, would need a massive rebase, etc.

https://gerrit.wikimedia.org/r/304146

This host eventually went away and the problematic code removed: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469532/