Page MenuHomePhabricator

dns200[12] lack IPv6 records
Closed, ResolvedPublic

Description

for some reason prometheus agent is failing to report 2620:0:860:ed1a::3:fe:53 service

vgutierrez@lvs2005:~$ fgrep 2620:0:860:ed1a::3:fe /etc/pybal/pybal.conf
ip = 2620:0:860:ed1a::3:fe
ip = 2620:0:860:ed1a::3:fe
vgutierrez@lvs2005:~$ sudo ipvsadm -Ln | fgrep 2620:0:860:ed1a::3:fe
TCP  [2620:0:860:ed1a::3:fe]:53 wrr
UDP  [2620:0:860:ed1a::3:fe]:53 wrr ops
vgutierrez@lvs2005:~$ curl http://lvs2005.codfw.wmnet:9100/metrics |fgrep 2620:0:860:ed1a::3:fe
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  151k  100  151k    0     0  12.7M      0 --:--:-- --:--:-- --:--:-- 13.4M
vgutierrez@lvs2005:~$

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptJan 17 2019, 6:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
CDanis triaged this task as Medium priority.Jan 17 2019, 6:54 PM
CDanis added a project: observability.

It appears that prometheus is not listing any IPVS service without backends, and right now (IPVS wise), dns_rec6 doesn't have any backend server configured in lvs200[25].

But apparently, both dns200[12] are currently pooled:

vgutierrez@puppetmaster1001:~$ sudo confctl --tags dc=codfw,cluster=dns,service=pdns_recursor --action get all
{"dns2002.wikimedia.org": {"weight": 10, "pooled": "yes"}, "dns2001.wikimedia.org": {"weight": 10, "pooled": "yes"}}

Change 485169 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/dns@master] add missing IPv6 records for dns200[12].wikimedia.org

https://gerrit.wikimedia.org/r/485169

Change 485169 merged by Vgutierrez:
[operations/dns@master] add missing IPv6 records for dns200[12].wikimedia.org

https://gerrit.wikimedia.org/r/485169

Mentioned in SAL (#wikimedia-operations) [2019-01-18T10:23:38Z] <vgutierrez> restarting pybal in lvs2005 - T214072

Mentioned in SAL (#wikimedia-operations) [2019-01-18T10:29:48Z] <vgutierrez> restarting pybal in lvs2002 - T214072

Vgutierrez closed this task as Resolved.Jan 18 2019, 10:31 AM
Vgutierrez claimed this task.
Vgutierrez added a subscriber: Joe.

as @Joe properly pointed out in our IRC discussions, the main issue here is that dns200[12] lacked IPv6 records

Vgutierrez renamed this task from prometheus metrics apparently are missing some ipvs entries to dns200[12] lack IPv6 records.Jan 18 2019, 10:31 AM