Page MenuHomePhabricator

MySQL metrics monitoring
Open, MediumPublic

Description

This task is about mysql monitoring about metrics gathering (not alerts).

We've deployed prometheus-mysqld-exporter to put mysql metrics into prometheus in T126757: test prometheus mysqld-exporter. Most functionality provided by tendril is already there, though some things are missing:

  • multi-source replication metrics
  • support for multi-instance nodes
  • latency monitoring for a sample query or queries
  • Tables of data (e.g. summary of hosts and basic properties) is difficult to handle on grafana, may need a separate technology (e.g. web dashboard, orchestrator, ...)
  • fetch replication metrics from pt-hearthbeat (tracked in T141968: Display lag on grafana (prometheus) from pt-heartbeat instead (or in addition) of Seconds_Behind_Master)
  • performance_schema upstream monitoring is unflexible, and it may require a separate prometheus instance for privacy reasons
  • table properties such as size, number of rows is disabled, and it may require a separate prometheus instance for privacy reasons
  • prometheus configuration for mysql host/shard/role/etc is manual, should be automated via puppetdb. Also note that "role" (master/slave) should be exported by the machine itself not in the prometheus configuration, this makes things easier when changing master/slave. Shard OTOH doesn't change for the lifetime of the machine (?) and can be put in the prometheus configuration

Related Objects

Event Timeline

Dzahn triaged this task as Medium priority.Sep 22 2016, 2:32 AM

Change 338988 had a related patch set uploaded (by Jcrespo):
Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

Change 338988 merged by jenkins-bot:
[operations/puppet/mariadb] Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

Change 341557 had a related patch set uploaded (by jynus):
[operations/puppet] mariadb: Separate sanitarium role && monitore it on prometheus

https://gerrit.wikimedia.org/r/341557

Change 341557 merged by Jcrespo:
[operations/puppet] mariadb: Separate sanitarium role && monitor it on prometheus

https://gerrit.wikimedia.org/r/341557

jcrespo moved this task from Triage to Meta/Epic on the DBA board.
jcrespo renamed this task from MySQL monitoring with prometheus to MySQL metrics monitoring.Oct 9 2017, 3:38 PM
jcrespo updated the task description. (Show Details)

Change 391558 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

Change 391558 merged by Jcrespo:
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

jcrespo changed the task status from Open to Stalled.Nov 30 2018, 3:38 PM
jcrespo changed the task status from Stalled to Open.Jun 26 2019, 10:11 AM
jcrespo claimed this task.

Change 519203 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521839 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

Change 521839 merged by Jcrespo:
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

Change 519203 merged by Jcrespo:
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521845 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521845 merged by Jcrespo:
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521847 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 521847 merged by Jcrespo:
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 522032 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

Change 522032 merged by Jcrespo:
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

Change 522040 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

Change 522040 merged by Jcrespo:
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

Change 521852 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 521852 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 522061 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

Change 522061 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root       root  2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root       root   612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root       root   544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root       root   544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root       root   621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root       root   275 Jul 11 11:27 mysql-parsercache_codfw.yaml
root@prometheus2003:/srv/prometheus/ops/targets$ date
Thu Jul 11 11:29:19 UTC 2019
root@prometheus2003:/srv/prometheus/ops/targets$ run-puppet-agent 
Warning: Downgrading to PSON for future requests
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for prometheus2003.codfw.wmnet
Info: Applying configuration version '1562844569'
Notice: /Stage[main]/Profile::Prometheus::Ops_mysql/Exec[generate-mysqld-exporter-config]/returns: executed successfully
Notice: Applied catalog in 18.99 seconds
root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root root 2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root root  612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root root  544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root root  544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root root  621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root root  275 Jul 11 11:27 mysql-parsercache_codfw.yaml

Great work, a lot less files to edit when provisioning/moving/decommissioning hosts which were very error prone!
Thanks :)

jcrespo subscribed.

Change 596615 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Remove redundant include of prometheus node_exporter

https://gerrit.wikimedia.org/r/596615

Change 596615 merged by Jcrespo:
[operations/puppet@production] mariadb: Remove redundant include of prometheus node_exporter

https://gerrit.wikimedia.org/r/596615