MySQL metrics monitoring
Open, MediumPublic
Actions

Assigned To

None

Authored By

	fgiunchedi
	Aug 25 2016, 2:15 PM

Description

This task is about mysql monitoring about metrics gathering (not alerts).

We've deployed prometheus-mysqld-exporter to put mysql metrics into prometheus in T126757: test prometheus mysqld-exporter. Most functionality provided by tendril is already there, though some things are missing:

multi-source replication metrics
support for multi-instance nodes
latency monitoring for a sample query or queries
Tables of data (e.g. summary of hosts and basic properties) is difficult to handle on grafana, may need a separate technology (e.g. web dashboard, orchestrator, ...)
fetch replication metrics from pt-hearthbeat (tracked in T141968: Display lag on grafana (prometheus) from pt-heartbeat instead (or in addition) of Seconds_Behind_Master)
performance_schema upstream monitoring is unflexible, and it may require a separate prometheus instance for privacy reasons
table properties such as size, number of rows is disabled, and it may require a separate prometheus instance for privacy reasons
prometheus configuration for mysql host/shard/role/etc is manual, should be automated via puppetdb. Also note that "role" (master/slave) should be exported by the machine itself not in the prometheus configuration, this makes things easier when changing master/slave. Shard OTOH doesn't change for the lifetime of the machine (?) and can be put in the prometheus configuration

Details

Subject	Repo	Branch	Lines +/-
mariadb: Remove redundant include of prometheus node_exporter	operations/puppet	production	+0 -1
Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""	operations/puppet	production	+277 -783
Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""	operations/puppet	production	+277 -783
prometheus: move prometheus secrets back to the original role	labs/private	master	+0 -0
prometheus-mysqld-exporter: move variable to profile	labs/private	master	+0 -0
prometheus-mysqld-exporter: Automate targets based on zarcillo db	operations/puppet	production	+255 -778
mysql-prometheus-exporter: Fix typo on puppet requirement	operations/puppet	production	+2 -3
mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs	operations/puppet	production	+4 -0
prometheus: Add fake prometheus labs password	labs/private	master	+1 -0
Link to grafana rather than to ganglia on tendril	operations/software/tendril	master	+3 -4
mariadb: Separate sanitarium role && monitor it on prometheus	operations/puppet	production	+94 -89
Remove old CA (ssl='on') and add a new option "socket"	operations/puppet/mariadb	master	+12 -28

Related Objects
Search...

Status	Assigned	Task
Open	None	T143896 MySQL metrics monitoring
Open	None	T141968 Display lag on grafana (prometheus) from pt-heartbeat instead (or in addition) of Seconds_Behind_Master
Resolved	fgiunchedi	T145072 Generate instance list of active database hosts to be monitored from prometheus
Declined	None	T147476 Upgrade mysqld_exporter to 0.9.0
Resolved	jcrespo	T161296 Upgrade mysqld_exporter in production
Open	None	T164834 In some database hosts, performance schema loses digest statistics
Resolved	jcrespo	T170666 Refactor prometheus-mysqld-exporter to support multi-instance hosts
Duplicate	None	T177779 Generate instance list of database hosts to be monitored automatically from exported resources
Resolved	aaron	T177778 Improve database application performance monitoring visibility
Resolved	• Kormat	T252761 Research performance changes on prometheus-mysqld-exporter after buster/mariadb upgrade
Open	None	T273054 Investigate using PMM (Percona Monitoring and Management) for slow-query analysis
Resolved	Ladsgroup	T297435 Send metrics of db errors of mediawiki to prometheus

Event Timeline

fgiunchedi created this task.Aug 25 2016, 2:15 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 25 2016, 2:15 PM

jcrespo added a subtask: T141968: Display lag on grafana (prometheus) from pt-heartbeat instead (or in addition) of Seconds_Behind_Master.Aug 25 2016, 2:16 PM

jcrespo updated the task description. (Show Details)Aug 29 2016, 3:21 PM

fgiunchedi updated the task description. (Show Details)Aug 30 2016, 8:26 AM

jcrespo updated the task description. (Show Details)Aug 31 2016, 9:24 AM

jcrespo created subtask T145072: Generate instance list of active database hosts to be monitored from prometheus.Sep 8 2016, 12:38 PM

fgiunchedi mentioned this in T126757: test prometheus mysqld-exporter.Sep 13 2016, 9:45 AM

Dzahn triaged this task as Medium priority.Sep 22 2016, 2:32 AM

fgiunchedi created subtask T147476: Upgrade mysqld_exporter to 0.9.0.Oct 5 2016, 4:27 PM

Change 338988 had a related patch set uploaded (by Jcrespo):
Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

gerritbot added a project: Patch-For-Review.Feb 21 2017, 4:18 PM

Change 338988 merged by jenkins-bot:
[operations/puppet/mariadb] Remove old CA (ssl='on') and add a new option "socket"

https://gerrit.wikimedia.org/r/338988

Change 341557 had a related patch set uploaded (by jynus):
[operations/puppet] mariadb: Separate sanitarium role && monitore it on prometheus

https://gerrit.wikimedia.org/r/341557

Change 341557 merged by Jcrespo:
[operations/puppet] mariadb: Separate sanitarium role && monitor it on prometheus

https://gerrit.wikimedia.org/r/341557

fgiunchedi created subtask T161296: Upgrade mysqld_exporter in production.Mar 24 2017, 9:37 AM

fgiunchedi closed subtask T147476: Upgrade mysqld_exporter to 0.9.0 as Declined.Mar 24 2017, 10:24 AM

jcrespo added a project: DBA.May 9 2017, 1:17 PM

jcrespo moved this task from Triage to Meta/Epic on the DBA board.

jcrespo created subtask T164834: In some database hosts, performance schema loses digest statistics.May 9 2017, 1:21 PM

faidon added a project: observability.Jul 10 2017, 1:08 PM

jcrespo updated the task description. (Show Details)Jul 10 2017, 1:33 PM

jcrespo added a subtask: T170666: Refactor prometheus-mysqld-exporter to support multi-instance hosts.Jul 25 2017, 10:25 AM

jcrespo updated the task description. (Show Details)

jcrespo renamed this task from MySQL monitoring with prometheus to MySQL metrics monitoring.Oct 9 2017, 3:38 PM

jcrespo closed subtask T170666: Refactor prometheus-mysqld-exporter to support multi-instance hosts as Resolved.

jcrespo updated the task description. (Show Details)

jcrespo created subtask T177779: Generate instance list of database hosts to be monitored automatically from exported resources.Oct 9 2017, 3:42 PM

jcrespo added a subtask: T177778: Improve database application performance monitoring visibility.

Change 391558 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

Change 391558 merged by Jcrespo:
[operations/software/tendril@master] Link to grafana rather than to ganglia on tendril

https://gerrit.wikimedia.org/r/391558

jcrespo changed the task status from Open to Stalled.Nov 30 2018, 3:38 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:48 PM

jcrespo changed the status of subtask T161296: Upgrade mysqld_exporter in production from Open to Stalled.Mar 6 2019, 4:37 PM

jcrespo changed the task status from Stalled to Open.Jun 26 2019, 10:11 AM

jcrespo claimed this task.

Change 519203 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521839 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

Change 521839 merged by Jcrespo:
[labs/private@master] prometheus: Add fake prometheus labs password

https://gerrit.wikimedia.org/r/521839

jcrespo mentioned this in rLPRI3e95207086fa: prometheus: Add fake prometheus labs password.Jul 10 2019, 9:44 AM

Change 519203 merged by Jcrespo:
[operations/puppet@production] prometheus-mysqld-exporter: Automate targets based on zarcillo db

https://gerrit.wikimedia.org/r/519203

Change 521845 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521845 merged by Jcrespo:
[operations/puppet@production] mysql-prometheus-exporter: Fix typo on puppet requirement

https://gerrit.wikimedia.org/r/521845

Change 521847 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 521847 merged by Jcrespo:
[operations/puppet@production] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs

https://gerrit.wikimedia.org/r/521847

Change 522032 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

Change 522032 merged by Jcrespo:
[labs/private@master] prometheus-mysqld-exporter: move variable to profile

https://gerrit.wikimedia.org/r/522032

jcrespo mentioned this in rLPRI6aa78168423c: prometheus-mysqld-exporter: move variable to profile.Jul 11 2019, 7:35 AM

Change 522040 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

Change 522040 merged by Jcrespo:
[labs/private@master] prometheus: move prometheus secrets back to the original role

https://gerrit.wikimedia.org/r/522040

jcrespo mentioned this in rLPRI0cc83bae3ad3: prometheus: move prometheus secrets back to the original role.Jul 11 2019, 9:25 AM

Change 521852 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 521852 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""

https://gerrit.wikimedia.org/r/521852

Change 522061 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

Change 522061 merged by Jcrespo:
[operations/puppet@production] Revert "Revert "Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db""""

https://gerrit.wikimedia.org/r/522061

root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root       root  2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root       root   612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root       root   544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root       root   544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root       root   621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root       root   275 Jul 11 11:27 mysql-parsercache_codfw.yaml
root@prometheus2003:/srv/prometheus/ops/targets$ date
Thu Jul 11 11:29:19 UTC 2019
root@prometheus2003:/srv/prometheus/ops/targets$ run-puppet-agent 
Warning: Downgrading to PSON for future requests
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for prometheus2003.codfw.wmnet
Info: Applying configuration version '1562844569'
Notice: /Stage[main]/Profile::Prometheus::Ops_mysql/Exec[generate-mysqld-exporter-config]/returns: executed successfully
Notice: Applied catalog in 18.99 seconds
root@prometheus2003:/srv/prometheus/ops/targets$ ls -la mysql-*
-r--r--r-- 1 root root 2592 Jul 11 11:27 mysql-core_codfw.yaml
-r--r--r-- 1 root root  612 Jul 11 11:27 mysql-dbstore_codfw.yaml
-r--r--r-- 1 root root  544 Jul 10 10:57 mysql-labs_codfw.yaml
-rw-r--r-- 1 root root  544 Jul 10 10:48 mysql-labsdb_codfw.yaml
-r--r--r-- 1 root root  621 Jul 11 11:27 mysql-misc_codfw.yaml
-r--r--r-- 1 root root  275 Jul 11 11:27 mysql-parsercache_codfw.yaml

jcrespo updated the task description. (Show Details)Jul 11 2019, 11:33 AM

Great work, a lot less files to edit when provisioning/moving/decommissioning hosts which were very error prone!
Thanks :)

jcrespo removed jcrespo as the assignee of this task.Jul 19 2019, 5:30 PM

jcrespo subscribed.

fgiunchedi closed subtask T145072: Generate instance list of active database hosts to be monitored from prometheus as Resolved.Aug 19 2019, 2:03 PM

jcrespo closed subtask T177778: Improve database application performance monitoring visibility as Resolved.May 13 2020, 2:46 PM

jcrespo mentioned this in T177778: Improve database application performance monitoring visibility.

Change 596615 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Remove redundant include of prometheus node_exporter

https://gerrit.wikimedia.org/r/596615

Change 596615 merged by Jcrespo:
[operations/puppet@production] mariadb: Remove redundant include of prometheus node_exporter

https://gerrit.wikimedia.org/r/596615

jcrespo closed subtask T161296: Upgrade mysqld_exporter in production as Resolved.May 21 2020, 10:51 AM

jcrespo added a subtask: T252761: Research performance changes on prometheus-mysqld-exporter after buster/mariadb upgrade.May 21 2020, 10:54 AM

fgiunchedi moved this task from Inbox to Radar on the observability board.Jul 20 2020, 1:16 PM

jcrespo added a subtask: T273054: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis.Jan 27 2021, 1:13 PM

jcrespo mentioned this in T273054: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis.

jcrespo updated the task description. (Show Details)Jan 27 2021, 1:18 PM

LSobanski edited projects, added Data-Persistence; removed DBA.May 14 2021, 5:13 PM

LSobanski moved this task from Inbox to Epic - Database on the Data-Persistence board.May 14 2021, 5:40 PM

Marostegui closed subtask T252761: Research performance changes on prometheus-mysqld-exporter after buster/mariadb upgrade as Resolved.Nov 26 2021, 8:53 AM