Page MenuHomePhabricator

Unify, if possible, AQS and Restbase's cassandra dashboards
Closed, ResolvedPublic5 Estimated Story Points

Description

AQS now runs the prometheus jmx exporter for cassandra metrics, so it would be really good to be able to share the restbase's cassandra dashboard without re-inventing another one (same thing would be good for the maps cluster).

Currently there is a problem with metric names that changed from 2.x to 3.x, and Filippo suggested a rename via prometheus masters as workaround:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cmetric_relabel_configs%3E

Event Timeline

elukey triaged this task as Medium priority.Apr 25 2018, 12:29 PM
elukey created this task.

I created https://grafana-admin.wikimedia.org/dashboard/db/cassandra-aqs to port manually all the metrics names and see the discrepancies. This is what I found so far:

  • cassandra_columnfamily_* metrics have been renamed to cassandra_table_* in 3.x (same thing with labels named "columnfamily", that now are "table").
  • Percentile metrics like cassandra_table_readlatency_75p seems not present in 2.x

Change 428931 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cassandra: add percentile metrics to 2.2's prometheus jmx config

https://gerrit.wikimedia.org/r/428931

Change 428931 merged by Elukey:
[operations/puppet@production] cassandra: add percentile metrics to 2.x's prometheus jmx config

https://gerrit.wikimedia.org/r/428931

Cloned all the Cassandra dashboards created by the Services team, and adapted to AQS. The only change that I had to do was to replace 'table' with 'columnfamily'.

https://grafana.wikimedia.org/dashboard/db/aqs-cassandra-tables -> Partition size seems still not working, those metrics are missing (only two, not a big deal).

fdans lowered the priority of this task from Medium to Low.Apr 30 2018, 4:33 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 430399 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::prometheus::analytics: rename cassandra metrics/labels

https://gerrit.wikimedia.org/r/430399

Change 430399 merged by Elukey:
[operations/puppet@production] role::prometheus::analytics: rename cassandra metrics/labels

https://gerrit.wikimedia.org/r/430399

Change 430563 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::prometheus::analytics: fix cassandra relabel config

https://gerrit.wikimedia.org/r/430563

Change 430563 merged by Elukey:
[operations/puppet@production] role::prometheus::analytics: fix cassandra relabel config

https://gerrit.wikimedia.org/r/430563

elukey set the point value for this task to 5.May 3 2018, 9:18 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.
Vvjjkkii renamed this task from Unify, if possible, AQS and Restbase's cassandra dashboards to q9daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Mainframe98 renamed this task from q9daaaaaaa to Unify, if possible, AQS and Restbase's cassandra dashboards.Jul 1 2018, 8:24 AM
Mainframe98 closed this task as Resolved.
Mainframe98 assigned this task to elukey.
Mainframe98 lowered the priority of this task from High to Low.
Mainframe98 updated the task description. (Show Details)
Mainframe98 set the point value for this task to 5.
Mainframe98 added subscribers: gerritbot, Aklapper.

Change 508809 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] prometheus: enable metrics relabel

https://gerrit.wikimedia.org/r/508809

Change 508809 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: enable metrics relabel

https://gerrit.wikimedia.org/r/508809