After @fselles depooled kubernetes1001.eqiad.wmnet (T213859) the following alert appeared on lvs1006 and lvs1016:
CRITICAL: Hosts in IPVS but unknown to PyBal: set(['kubernetes1001.eqiad.wmnet'])
Using ipvsadm we can confirm that kubernetes1001 is still in ipvs in both LVS servers:
vgutierrez@lvs1016:/var/log$ host kubernetes1001.eqiad.wmnet kubernetes1001.eqiad.wmnet has address 10.64.0.121`
vgutierrez@cumin1001:~$ sudo cumin lvs1006.wikimedia.org,lvs1016.eqiad.wmnet "ipvsadm -Ln | fgrep 10.64.0.121" 2 hosts will be targeted: lvs1016.eqiad.wmnet,lvs1006.wikimedia.org Confirm to continue [y/n]? y ===== NODE GROUP ===== (2) lvs1016.eqiad.wmnet,lvs1006.wikimedia.org ----- OUTPUT of 'ipvsadm -Ln | fgrep 10.64.0.121' ----- -> 10.64.0.121:1968 Route 10 0 0
digging a little bit further we discover that ipvsadm still lists kubernetes1001 for the following service:
TCP 10.2.2.29:1968 wrr -> 10.64.0.121:1968 Route 10 0 0 -> 10.64.16.75:1968 Route 10 0 0 -> 10.64.32.23:1968 Route 10 0 0 -> 10.64.48.52:1968 Route 10 0 0
Currently there is no service configured in pybal for 10.2.2.29:1968:
vgutierrez@cumin1001:~$ sudo cumin -x lvs1006.wikimedia.org,lvs1016.eqiad.wmnet "fgrep 10.2.2.29 /etc/pybal/pybal.conf" IGNORE EXIT CODES mode enabled, all commands executed will be considered successful 2 hosts will be targeted: lvs1016.eqiad.wmnet,lvs1006.wikimedia.org Confirm to continue [y/n]? y ===== NO OUTPUT ===== PASS: vgutierrez@cumin1001:~$ sudo cumin -x lvs1006.wikimedia.org,lvs1016.eqiad.wmnet "fgrep 1968 /etc/pybal/pybal.conf" IGNORE EXIT CODES mode enabled, all commands executed will be considered successful 2 hosts will be targeted: lvs1016.eqiad.wmnet,lvs1006.wikimedia.org Confirm to continue [y/n]? y ===== NO OUTPUT ===== PASS:
Digging a little bit on our puppet repo, we find that 10.2.2.29 was the service IP used for zoterov2: https://github.com/wikimedia/puppet/commit/d33c334e112d58a55c8564828606efc0db56f6f4