There are failed systemd units on maps1009:
gehel@maps1009:~$ sudo systemctl list-units --state=failed UNIT LOAD ACTIVE SUB DESCRIPTION ● cassandra-metrics-collector.service loaded failed failed cassandra metrics co ● wmf_auto_restart_cassandra-metrics-collector.service loaded failed failed Aut
The main class isn't found, so that looks like a broken deployment:
gehel@maps1009:~$ sudo systemctl status cassandra-metrics-collector.service ● cassandra-metrics-collector.service - cassandra metrics collector Loaded: loaded (/lib/systemd/system/cassandra-metrics-collector.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-02-22 08:06:33 UTC; 3min 18s ago Process: 42666 ExecStart=/usr/bin/java org.wikimedia.cassandra.metrics.service.Service --graphite-host graphite-in.eqiad.wmnet --graphite-port 2003 (code=exited, status=1/FAILURE) Main PID: 42666 (code=exited, status=1/FAILURE) Feb 22 08:06:33 maps1009 systemd[1]: Started cassandra metrics collector. Feb 22 08:06:33 maps1009 java[42666]: Error: Could not find or load main class org.wikimedia.cassandra.metrics.service.Service Feb 22 08:06:33 maps1009 systemd[1]: cassandra-metrics-collector.service: Main process exited, code=exited, status=1/FAILURE Feb 22 08:06:33 maps1009 systemd[1]: cassandra-metrics-collector.service: Failed with result 'exit-code'.