Page MenuHomePhabricator

prometheus-wmf-elasticsearch-exporter interferes with prometheus-wmf-elasticsearch-exporter-9* unit on elastic nodes
Closed, ResolvedPublic

Description

We should only have two unit of prometheus-wmf-elasticsearch-exporter-9* on elastic nodes. prometheus-wmf-elasticsearch-exporter starts and uses the same ports as prometheus-wmf-elasticsearch-exporter-9200 thereby preventing it from starting as they use the same port. This is probably some puppet corrections. We should align these units to make sure the correct ones are started

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

After investigating, I noticed prometheus-wmf-elasticsearch-exporter was created prior to multi-instance setup. This unit is no longer needed. It is present on some nodes and absent on others. It is probably absent on nodes that were setup after multi-instance. Here are some ways to get rid of it completely:

  1. Use puppet and ensure=>absent on this resource (prometheus-wmf-elasticsearch-exporter)
  2. Use cumin to delete these unit across the nodes where it is present

@Gehel what do you think?

redundant units have been cleaned via cumin:

sudo cumin 'elastic[2025-2026,2028,2031,2034,2047,2052].codfw.wmnet' 'rm /etc/systemd/system/multi-user.target.wants/prometheus-elasticsearch-exporter-9600.service ; systemctl daemon-reload'

(and similar commands for other nodes)

It looks like we only have the units we need left:

gehel@cumin2001:~$ sudo cumin 'A:elastic' 'systemctl list-units -a | grep prometheus-elastic'
65 hosts will be targeted:
elastic[2025-2054].codfw.wmnet,elastic[1017-1020,1022-1052].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                                                                                                             
(1) elastic2028.codfw.wmnet                                                                                                                                                                                        
----- OUTPUT of 'systemctl list-u...ometheus-elastic' -----                                                                                                                                                        
  prometheus-elasticsearch-exporter-9200.service                                            loaded    active     running         Prometheus exporter for Elasticsearch                                             
  prometheus-elasticsearch-exporter-9400.service                                            loaded    active     running         Prometheus exporter for Elasticsearch                                             
===== NODE GROUP =====                                                                                                                                                                                             
(32) elastic[2027,2029-2030,2032-2033,2035-2036,2039-2040,2043-2044,2048-2049,2053-2054].codfw.wmnet,elastic[1024-1027,1035,1039,1042-1052].eqiad.wmnet                                                            
----- OUTPUT of 'systemctl list-u...ometheus-elastic' -----                                                                                                                                                        
  prometheus-elasticsearch-exporter-9200.service                                            loaded    active   running   Prometheus exporter for Elasticsearch                                                     
  prometheus-elasticsearch-exporter-9600.service                                            loaded    active   running   Prometheus exporter for Elasticsearch                                                     
===== NODE GROUP =====                                                                                                                                                                                             
(32) elastic[2025-2026,2031,2034,2037-2038,2041-2042,2045-2047,2050-2052].codfw.wmnet,elastic[1017-1020,1022-1023,1028-1034,1036-1038,1040-1041].eqiad.wmnet                                                       
----- OUTPUT of 'systemctl list-u...ometheus-elastic' -----                                                                                                                                                        
  prometheus-elasticsearch-exporter-9200.service                                            loaded    active   running   Prometheus exporter for Elasticsearch                                                     
  prometheus-elasticsearch-exporter-9400.service                                            loaded    active   running   Prometheus exporter for Elasticsearch                                                     
================                                                                                                                                                                                                   
PASS:  |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (65/65) [00:01<00:00, 60.52hosts/s]     
FAIL:  |                                                                                                                                                                     |   0% (0/65) [00:01<?, ?hosts/s]     
100.0% (65/65) success ratio (>= 100.0% threshold) for command: 'systemctl list-u...ometheus-elastic'.
100.0% (65/65) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.