We should only have two unit of prometheus-wmf-elasticsearch-exporter-9* on elastic nodes. prometheus-wmf-elasticsearch-exporter starts and uses the same ports as prometheus-wmf-elasticsearch-exporter-9200 thereby preventing it from starting as they use the same port. This is probably some puppet corrections. We should align these units to make sure the correct ones are started
Description
Description
Event Timeline
Comment Actions
After investigating, I noticed prometheus-wmf-elasticsearch-exporter was created prior to multi-instance setup. This unit is no longer needed. It is present on some nodes and absent on others. It is probably absent on nodes that were setup after multi-instance. Here are some ways to get rid of it completely:
- Use puppet and ensure=>absent on this resource (prometheus-wmf-elasticsearch-exporter)
- Use cumin to delete these unit across the nodes where it is present
@Gehel what do you think?
Comment Actions
redundant units have been cleaned via cumin:
sudo cumin 'elastic[2025-2026,2028,2031,2034,2047,2052].codfw.wmnet' 'rm /etc/systemd/system/multi-user.target.wants/prometheus-elasticsearch-exporter-9600.service ; systemctl daemon-reload'
(and similar commands for other nodes)
It looks like we only have the units we need left:
gehel@cumin2001:~$ sudo cumin 'A:elastic' 'systemctl list-units -a | grep prometheus-elastic' 65 hosts will be targeted: elastic[2025-2054].codfw.wmnet,elastic[1017-1020,1022-1052].eqiad.wmnet Confirm to continue [y/n]? y ===== NODE GROUP ===== (1) elastic2028.codfw.wmnet ----- OUTPUT of 'systemctl list-u...ometheus-elastic' ----- prometheus-elasticsearch-exporter-9200.service loaded active running Prometheus exporter for Elasticsearch prometheus-elasticsearch-exporter-9400.service loaded active running Prometheus exporter for Elasticsearch ===== NODE GROUP ===== (32) elastic[2027,2029-2030,2032-2033,2035-2036,2039-2040,2043-2044,2048-2049,2053-2054].codfw.wmnet,elastic[1024-1027,1035,1039,1042-1052].eqiad.wmnet ----- OUTPUT of 'systemctl list-u...ometheus-elastic' ----- prometheus-elasticsearch-exporter-9200.service loaded active running Prometheus exporter for Elasticsearch prometheus-elasticsearch-exporter-9600.service loaded active running Prometheus exporter for Elasticsearch ===== NODE GROUP ===== (32) elastic[2025-2026,2031,2034,2037-2038,2041-2042,2045-2047,2050-2052].codfw.wmnet,elastic[1017-1020,1022-1023,1028-1034,1036-1038,1040-1041].eqiad.wmnet ----- OUTPUT of 'systemctl list-u...ometheus-elastic' ----- prometheus-elasticsearch-exporter-9200.service loaded active running Prometheus exporter for Elasticsearch prometheus-elasticsearch-exporter-9400.service loaded active running Prometheus exporter for Elasticsearch ================ PASS: |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (65/65) [00:01<00:00, 60.52hosts/s] FAIL: | | 0% (0/65) [00:01<?, ?hosts/s] 100.0% (65/65) success ratio (>= 100.0% threshold) for command: 'systemctl list-u...ometheus-elastic'. 100.0% (65/65) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.