Page MenuHomePhabricator

Cassandra metrics collector not running on maps1009
Closed, ResolvedPublic

Description

There are failed systemd units on maps1009:

gehel@maps1009:~$ sudo systemctl list-units --state=failed
  UNIT                          LOAD   ACTIVE SUB    DESCRIPTION              
● cassandra-metrics-collector.service loaded failed failed cassandra metrics co
● wmf_auto_restart_cassandra-metrics-collector.service loaded failed failed Aut

The main class isn't found, so that looks like a broken deployment:

gehel@maps1009:~$ sudo systemctl status cassandra-metrics-collector.service 
● cassandra-metrics-collector.service - cassandra metrics collector
   Loaded: loaded (/lib/systemd/system/cassandra-metrics-collector.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2021-02-22 08:06:33 UTC; 3min 18s ago
  Process: 42666 ExecStart=/usr/bin/java org.wikimedia.cassandra.metrics.service.Service --graphite-host graphite-in.eqiad.wmnet --graphite-port 2003 (code=exited, status=1/FAILURE)
 Main PID: 42666 (code=exited, status=1/FAILURE)

Feb 22 08:06:33 maps1009 systemd[1]: Started cassandra metrics collector.
Feb 22 08:06:33 maps1009 java[42666]: Error: Could not find or load main class org.wikimedia.cassandra.metrics.service.Service
Feb 22 08:06:33 maps1009 systemd[1]: cassandra-metrics-collector.service: Main process exited, code=exited, status=1/FAILURE
Feb 22 08:06:33 maps1009 systemd[1]: cassandra-metrics-collector.service: Failed with result 'exit-code'.

Event Timeline

Change 696399 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] cassandra: drop support for 2.1 in metrics. Fix version of metrics collector for cassandra 3

https://gerrit.wikimedia.org/r/696399

Change 696399 merged by Hnowlan:

[operations/puppet@production] cassandra: drop support for 2.1 in metrics. Fix collector version

https://gerrit.wikimedia.org/r/696399