Page MenuHomePhabricator

Remove legacy ELK LVS entries
Closed, ResolvedPublic

Description

The last of the elk5 hosts are ready for decom, time to remove the old LVS instances for legacy logstash inputs and the old kibana dashboard.

From https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service The procedure for removal of a service should more or less follow the inverse order of what gets done adding it. It is important to perform the following actions in order. Specifically:

Event Timeline

Change 755480 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] remove elk5 related LVS services

https://gerrit.wikimedia.org/r/755480

Change 755789 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] switch legacy elk LVS entries to state: lvs_setup

https://gerrit.wikimedia.org/r/755789

Change 755790 had a related patch set uploaded (by Herron; author: Herron):

[operations/dns@master] remove kibana.discovery.wmnet record

https://gerrit.wikimedia.org/r/755790

Mentioned in SAL (#wikimedia-operations) [2022-01-21T15:07:41Z] <herron> removing kibana.discovery.wmnet record and switching legacy elk LVS instances to state: lvs_setup T299700

Change 755790 merged by Herron:

[operations/dns@master] remove kibana.discovery.wmnet record

https://gerrit.wikimedia.org/r/755790

Change 755789 merged by Herron:

[operations/puppet@production] switch legacy elk LVS entries to state: lvs_setup

https://gerrit.wikimedia.org/r/755789

Services have been moved to lvs_setup, but there are some pybal icinga alerts still open e.g.

lvs1015 PyBal IPVS diff check CRITICAL	2022-01-21 17:11:09	1d 21h 24m 8s	3/3	CRITICAL: Services known to PyBal but not to IPVS: set(['10.2.2.36:11514', '10.2.2.36:8324', '10.2.2.33:443', '10.2.2.33:80', '10.2.2.36:12201'])
lvs1015 PyBal backends health check PYBAL CRITICAL - CRITICAL - kibana-ssl_443: Servers logstash1009.eqiad.wmnet, logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-tcp_11514: Servers logstash1009.eqiad.wmnet, logstash1008.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1009.eqiad.wmnet, logstash1008.eqiad.wmnet are marked down but pooled

I'm assuming these will go away after a pybal restart, going to connect with Traffic before proceeding with that

Change 756036 had a related patch set uploaded (by Herron; author: Herron):

[operations/dns@master] remove kibana-disc from discovery-metafo-resources

https://gerrit.wikimedia.org/r/756036

Change 756036 merged by Herron:

[operations/dns@master] remove kibana-disc from discovery-metafo-resources

https://gerrit.wikimedia.org/r/756036

Change 756038 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] remove realserver_ips from legacy elk roles & set lvs state: service_setup

https://gerrit.wikimedia.org/r/756038

Change 756038 merged by Herron:

[operations/puppet@production] remove realserver_ips from legacy elk roles & set lvs state: service_setup

https://gerrit.wikimedia.org/r/756038

Mentioned in SAL (#wikimedia-operations) [2022-01-21T18:46:42Z] <herron> restarting pybal on lvs1015,lvs1020,lvs2009,lvs2010 to remove legacy elk5 services T299700

Change 755480 merged by Herron:

[operations/puppet@production] remove elk5 related LVS services

https://gerrit.wikimedia.org/r/755480

Change 756045 had a related patch set uploaded (by Herron; author: Herron):

[operations/dns@master] cleanup kibana.svc records

https://gerrit.wikimedia.org/r/756045

Change 756046 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] remove logstash and kibana entries from conftool-data discovery services

https://gerrit.wikimedia.org/r/756046

Change 756046 merged by Herron:

[operations/puppet@production] remove logstash and kibana entries from conftool-data discovery services

https://gerrit.wikimedia.org/r/756046

Change 756045 merged by Herron:

[operations/dns@master] cleanup logstash and kibana svc records

https://gerrit.wikimedia.org/r/756045

herron added a subscriber: BBlack.

These have been removed with much help from @BBlack thank you!

Volans subscribed.

FYI the service IPs are still allocated in Netbox:
https://netbox.wikimedia.org/ipam/ip-addresses/?q=kibana.svc
https://netbox.wikimedia.org/ipam/ip-addresses/?q=logstash.svc

I guess they should be removed. Once done make sure to run the sre.dns.netbox cookbook too.

Mentioned in SAL (#wikimedia-operations) [2023-12-07T17:08:41Z] <herron@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"

Mentioned in SAL (#wikimedia-operations) [2023-12-07T17:09:35Z] <herron@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"

FYI the service IPs are still allocated in Netbox:
https://netbox.wikimedia.org/ipam/ip-addresses/?q=kibana.svc
https://netbox.wikimedia.org/ipam/ip-addresses/?q=logstash.svc

I guess they should be removed. Once done make sure to run the sre.dns.netbox cookbook too.

Good catch thanks, this has been done