This work should be the same that has been done by the services team to migrate their metrics to Prometheus.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T175344 Move away from jmxtrans in favor of prometheus jmx_exporter | |||
Resolved | None | T186567 Deprecate cassandra-metrics-collector? | |||
Resolved | elukey | T184795 Add the prometheus jmx agent to AQS Cassandra | |||
Resolved | elukey | T189529 Test/upload new cassandra 2.2.6 package (wmf3) |
Event Timeline
Change 413405 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: enable Cassandra JMX exporter
As suggested by Eric, forced git fat pull to update the prometheus jmx exporter jar:
elukey@neodymium:~$ sudo cumin 'aqs*' 'ls -l /srv/deployment/prometheus/jmx_exporter/lib/jmx_prometheus_javaagent-0.8-20170117.190412-1.jar' 6 hosts will be targeted: aqs[1004-1009].eqiad.wmnet Confirm to continue [y/n]? y ===== NODE GROUP ===== (2) aqs[1007,1009].eqiad.wmnet ----- OUTPUT of 'ls -l /srv/deplo...117.190412-1.jar' ----- -rw-r--r-- 1 deploy-service deploy-service 1241757 Apr 19 2017 /srv/deployment/prometheus/jmx_exporter/lib/jmx_prometheus_javaagent-0.8-20170117.190412-1.jar ===== NODE GROUP ===== (4) aqs[1004-1006,1008].eqiad.wmnet ----- OUTPUT of 'ls -l /srv/deplo...117.190412-1.jar' ----- -rw-r--r-- 1 deploy-service deploy-service 1241757 Feb 23 08:03 /srv/deployment/prometheus/jmx_exporter/lib/jmx_prometheus_javaagent-0.8-20170117.190412-1.jar
I am wondering if after https://gerrit.wikimedia.org/r/#/c/402069/ it will be needed?
Change 413405 merged by Elukey:
[operations/puppet@production] role::aqs: enable Cassandra JMX exporter
Mentioned in SAL (#wikimedia-operations) [2018-02-27T16:53:14Z] <elukey> restart cassandra-a on aqs1004 to test the prometheus jmx agent before complete rollout - T184795
Sad news: we had to rollback due to an issue with the Cassandra 2.2.x startup script:
https://issues.apache.org/jira/browse/CASSANDRA-7254
https://github.com/apache/cassandra/blob/cassandra-2.2.6/bin/cassandra#L261
The above line starts also the new jmx javaagent due to JVM_OPTS, that in turn binds itself to port 7800 and waits for data. In turn, the cassandra startup gets stuck as well :)
@MoritzMuehlenhoff: would it be worth in your opinion to create a cassandra 2.2 component, rather than relying on thirdparty? As far as I can see cassandra 2.2.6 is in jessie-wikimedia/thirdparty..
thirdparty/foo is for packages we sync from external repositories (as such the packages in jessie-wikimedia are misplaced, we have some cruft there) while the cassandra packages are built by Eric, so creating a component/cassandra22 makes sense.
Change 421241 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cassandra: upgrade version 2.2 package settings
Change 421878 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: enable jmx agent
Change 421241 merged by Elukey:
[operations/puppet@production] cassandra: upgrade version 2.2 package settings for aqs
Change 421878 merged by Elukey:
[operations/puppet@production] role::aqs: enable jmx agent
Change 422103 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::prometheus::analytics: poll cassandra aqs metrics
Change 422103 merged by Elukey:
[operations/puppet@production] role::prometheus::analytics: poll cassandra aqs metrics
SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu