Page MenuHomePhabricator

MySQL metrics monitoring
Open, NormalPublic

Description

This task is about mysql monitoring about metrics gathering (not alerts).

We've deployed prometheus-mysqld-exporter to put mysql metrics into prometheus in T126757: test prometheus mysqld-exporter. Most functionality provided by tendril is already there, though some things are missing:

  • multi-source replication metrics
  • support for multi-instance nodes
  • audit-type functionality to list in a table e.g. (mysql_version, lag, qps) currently it isn't possible in grafana to have tables merging multiple metrics
  • fetch replication metrics from pt-hearthbeat (tracked in T141968: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master)
  • performance_schema upstream monitoring is unflexible, and it may require a separate prometheus instance for privacy reasons
  • prometheus configuration for mysql host/shard/role/etc is manual, should be automated via puppetdb. Also note that "role" (master/slave) should be exported by the machine itself not in the prometheus configuration, this makes things easier when changing master/slave. Shard OTOH doesn't change for the lifetime of the machine (?) and can be put in the prometheus configuration

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 25 2016, 2:15 PM
jcrespo updated the task description. (Show Details)Aug 29 2016, 3:21 PM
fgiunchedi updated the task description. (Show Details)Aug 30 2016, 8:26 AM
jcrespo updated the task description. (Show Details)Aug 31 2016, 9:24 AM
Dzahn triaged this task as Normal priority.Sep 22 2016, 2:32 AM

Change 338988 had a related patch set uploaded (by Jcrespo):
Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

Change 338988 merged by jenkins-bot:
[operations/puppet/mariadb] Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

Change 341557 had a related patch set uploaded (by jynus):
[operations/puppet] mariadb: Separate sanitarium role && monitore it on prometheus

https://gerrit.wikimedia.org/r/341557

Change 341557 merged by Jcrespo:
[operations/puppet] mariadb: Separate sanitarium role && monitor it on prometheus

https://gerrit.wikimedia.org/r/341557

jcrespo moved this task from Triage to Meta/Epic on the DBA board.
jcrespo updated the task description. (Show Details)Jul 10 2017, 1:33 PM
jcrespo renamed this task from MySQL monitoring with prometheus to MySQL metrics monitoring.Oct 9 2017, 3:38 PM
jcrespo updated the task description. (Show Details)

Change 391558 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

Change 391558 merged by Jcrespo:
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

jcrespo changed the task status from Open to Stalled.Nov 30 2018, 3:38 PM
jcrespo changed the status of subtask T161296: Upgrade mysqld_exporter in production from Open to Stalled.Mar 6 2019, 4:37 PM
jcrespo changed the task status from Stalled to Open.Jun 26 2019, 10:11 AM
jcrespo claimed this task.

Change 519203 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521839 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

Change 521839 merged by Jcrespo:
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

Change 519203 merged by Jcrespo:
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521845 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521845 merged by Jcrespo:
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521847 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 521847 merged by Jcrespo:
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 522032 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

Change 522032 merged by Jcrespo:
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

Change 522040 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

Change 522040 merged by Jcrespo:
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

Change 521852 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 521852 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 522061 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

Change 522061 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root       root  2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root       root   612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root       root   544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root       root   544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root       root   621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root       root   275 Jul 11 11:27 mysql-parsercache_codfw.yaml
root@prometheus2003:/srv/prometheus/ops/targets$ date
Thu Jul 11 11:29:19 UTC 2019
root@prometheus2003:/srv/prometheus/ops/targets$ run-puppet-agent 
Warning: Downgrading to PSON for future requests
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for prometheus2003.codfw.wmnet
Info: Applying configuration version '1562844569'
Notice: /Stage[main]/Profile::Prometheus::Ops_mysql/Exec[generate-mysqld-exporter-config]/returns: executed successfully
Notice: Applied catalog in 18.99 seconds
root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root root 2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root root  612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root root  544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root root  544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root root  621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root root  275 Jul 11 11:27 mysql-parsercache_codfw.yaml
jcrespo updated the task description. (Show Details)Jul 11 2019, 11:33 AM

Great work, a lot less files to edit when provisioning/moving/decommissioning hosts which were very error prone!
Thanks :)

jcrespo removed jcrespo as the assignee of this task.Jul 19 2019, 5:30 PM
jcrespo added a subscriber: jcrespo.