The RESTBase Cassandra cluster has some nodes where Prometheus metrics (those that come from the JMX exporter), are missing.
Description
Related Objects
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2018-04-18T14:37:00Z] <urandom> restarting Cassandra, restbase1011-a -- T192456
Mentioned in SAL (#wikimedia-operations) [2018-04-18T14:55:19Z] <urandom> restarting Cassandra, restbase1011-a to test v 0.8 of Prometheus JMX exporter -- T192456
For the machines affected, executing curl against the exporter URL just hangs indefinitely. I attempted to restart 1011-a to no avail. I then live-hacked cassandra-env.sh to roll back the exporter jar to the 0.8 version we used before, and it is now working. More investigation is needed.
Mentioned in SAL (#wikimedia-operations) [2018-04-19T20:48:12Z] <urandom> restarting cassandra to (temporarily) rollback prometheus jmx exporter -- T189822, T192456
Mentioned in SAL (#wikimedia-operations) [2018-04-19T20:48:24Z] <urandom> restarting cassandra to (temporarily) rollback prometheus jmx exporter, restbase1010-a -- T189822, T192456
Mentioned in SAL (#wikimedia-operations) [2018-04-19T21:11:56Z] <urandom> restarting cassandra to (temporarily) rollback prometheus jmx exporter, restbase1010-c -- T189822, T192456