Page MenuHomePhabricator

Investigate the root cause of prometheus-mysqld-exporter alert on db1119
Closed, DeclinedPublic

Description

prometheus-mysqld-exporter failed on db1119 with the following message:

Oct 09 09:12:02 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-09T09:12:02Z" level=error msg="Error pinging mysqld: Error 1226: User 'prometheus' has exceeded the 'max_user_connections' resource (current value: 5)" source="exporter.go:119"

Event Timeline

LSobanski renamed this task from Investigate the root cause of prometheus-mysqld-exporter alert to Investigate the root cause of prometheus-mysqld-exporter alert on db 1119.Oct 11 2021, 10:31 AM
LSobanski triaged this task as Medium priority.
LSobanski moved this task from Triage to Ready on the DBA board.
LSobanski updated the task description. (Show Details)

journalctl output is below - you can see the restart I did on 9 October. I presume the failure on 5 April is the "pme started before mysqld" error that we resolved with the work on T289488 ; so this does look like a different issue. At least we know the alerting works ;-)

-- Logs begin at Mon 2021-04-05 11:12:08 UTC, end at Mon 2021-10-11 10:09:09 UTC. --
Apr 05 11:12:15 db1119 systemd[1]: Started Prometheus exporter for MySQL server.
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg="Starting mysqld_exporter (version=0.11.0+ds, branch=debian/sid, revision=0.11.0+ds-1+b20)" source="mysqld_exporter.go:206"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg="Build context (go=go1.11.5, user=pkg-go-maintainers@lists.alioth.debian.org, date=20190311-02:06:43)" source="mysqld_exporter.go:207"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:218"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg=" --collect.global_status" source="mysqld_exporter.go:222"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:222"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:222"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg=" --collect.info_schema.processlist" source="mysqld_exporter.go:222"
Apr 05 11:12:15 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:15Z" level=info msg="Listening on :9104" source="mysqld_exporter.go:232"
Apr 05 11:12:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:12:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:12:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:13:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:13:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:13:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:13:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:14:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:14:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:14:36 db1119 prometheus-mysqld-exporter[692]: time="2021-04-05T11:14:36Z" level=error msg="Error pinging mysqld: dial unix /run/mysqld/mysqld.sock: connect: no such file or directory" source="exporter.go:119"
Apr 05 11:17:15 db1119 systemd[1]: Stopping Prometheus exporter for MySQL server...
Apr 05 11:17:15 db1119 systemd[1]: prometheus-mysqld-exporter.service: Main process exited, code=killed, status=15/TERM
Apr 05 11:17:15 db1119 systemd[1]: prometheus-mysqld-exporter.service: Succeeded.
Apr 05 11:17:15 db1119 systemd[1]: Stopped Prometheus exporter for MySQL server.
Apr 05 11:17:15 db1119 systemd[1]: Started Prometheus exporter for MySQL server.
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg="Starting mysqld_exporter (version=0.11.0+ds, branch=debian/sid, revision=0.11.0+ds-1+b20)" source="mysqld_exporter.go:206"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg="Build context (go=go1.11.5, user=pkg-go-maintainers@lists.alioth.debian.org, date=20190311-02:06:43)" source="mysqld_exporter.go:207"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:218"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg=" --collect.global_status" source="mysqld_exporter.go:222"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:222"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:222"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg=" --collect.info_schema.processlist" source="mysqld_exporter.go:222"
Apr 05 11:17:15 db1119 prometheus-mysqld-exporter[3610]: time="2021-04-05T11:17:15Z" level=info msg="Listening on :9104" source="mysqld_exporter.go:232"
Oct 01 05:22:00 db1119 systemd[1]: Stopping Prometheus exporter for MySQL server...
Oct 01 05:22:00 db1119 systemd[1]: prometheus-mysqld-exporter.service: Main process exited, code=killed, status=15/TERM
Oct 01 05:22:00 db1119 systemd[1]: prometheus-mysqld-exporter.service: Succeeded.
Oct 01 05:22:00 db1119 systemd[1]: Stopped Prometheus exporter for MySQL server.
Oct 01 05:23:59 db1119 systemd[1]: Started Prometheus exporter for MySQL server.
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg="Starting mysqld_exporter (version=0.11.0+ds, branch=debian/sid, revision=0.11.0+ds-1+b20)" source="mysqld_exporter.go:206"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg="Build context (go=go1.11.5, user=pkg-go-maintainers@lists.alioth.debian.org, date=20190311-02:06:43)" source="mysqld_exporter.go:207"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:218"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg=" --collect.global_status" source="mysqld_exporter.go:222"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:222"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:222"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg=" --collect.info_schema.processlist" source="mysqld_exporter.go:222"
Oct 01 05:23:59 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-01T05:23:59Z" level=info msg="Listening on :9104" source="mysqld_exporter.go:232"
Oct 09 09:12:02 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-09T09:12:02Z" level=error msg="Error pinging mysqld: Error 1226: User 'prometheus' has exceeded the 'max_user_connections' resource (current value: 5)" source="exporter.go:119"
Oct 09 09:12:02 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-09T09:12:02Z" level=error msg="Error pinging mysqld: Error 1226: User 'prometheus' has exceeded the 'max_user_connections' resource (current value: 5)" source="exporter.go:119"
Oct 09 09:12:02 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-09T09:12:02Z" level=error msg="Error pinging mysqld: Error 1226: User 'prometheus' has exceeded the 'max_user_connections' resource (current value: 5)" source="exporter.go:119"
Oct 09 09:12:02 db1119 prometheus-mysqld-exporter[25582]: time="2021-10-09T09:12:02Z" level=error msg="Error pinging mysqld: Error 1226: User 'prometheus' has exceeded the 'max_user_connections' resource (current value: 5)" source="exporter.go:119"
Oct 09 10:11:43 db1119 systemd[1]: Stopping Prometheus exporter for MySQL server...
Oct 09 10:11:43 db1119 systemd[1]: prometheus-mysqld-exporter.service: Main process exited, code=killed, status=15/TERM
Oct 09 10:11:43 db1119 systemd[1]: prometheus-mysqld-exporter.service: Succeeded.
Oct 09 10:11:43 db1119 systemd[1]: Stopped Prometheus exporter for MySQL server.
Oct 09 10:11:43 db1119 systemd[1]: Started Prometheus exporter for MySQL server.
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg="Starting mysqld_exporter (version=0.11.0+ds, branch=debian/sid, revision=0.11.0+ds-1+b20)" source="mysqld_exporter.go:206"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg="Build context (go=go1.11.5, user=pkg-go-maintainers@lists.alioth.debian.org, date=20190311-02:06:43)" source="mysqld_exporter.go:207"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:218"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg=" --collect.global_status" source="mysqld_exporter.go:222"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:222"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:222"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg=" --collect.info_schema.processlist" source="mysqld_exporter.go:222"
Oct 09 10:11:43 db1119 prometheus-mysqld-exporter[8593]: time="2021-10-09T10:11:43Z" level=info msg="Listening on :9104" source="mysqld_exporter.go:232"

In case it's useful for future comparison, this is what lsof says about the exporter process now (ie during normal behviour)

COMMAND    PID       USER   FD      TYPE             DEVICE SIZE/OFF      NODE NAME
prometheu 8593 prometheus  cwd       DIR                8,1     4096         2 /
prometheu 8593 prometheus  rtd       DIR                8,1     4096         2 /
prometheu 8593 prometheus  txt       REG                8,1  9681936   1982817 /usr/bin/prometheus-mysqld-exporter
prometheu 8593 prometheus  mem       REG                8,1  1824496   1971450 /usr/lib/x86_64-linux-gnu/libc-2.28.so
prometheu 8593 prometheus  mem       REG                8,1   146968   1971464 /usr/lib/x86_64-linux-gnu/libpthread-2.28.so
prometheu 8593 prometheus  mem       REG                8,1   165632   1971446 /usr/lib/x86_64-linux-gnu/ld-2.28.so
prometheu 8593 prometheus    0r      CHR                1,3      0t0         6 /dev/null
prometheu 8593 prometheus    1u     unix 0xffff968af4e47400      0t0 719869330 type=STREAM
prometheu 8593 prometheus    2u     unix 0xffff968af4e47400      0t0 719869330 type=STREAM
prometheu 8593 prometheus    3u     IPv6          719782869      0t0       TCP *:9104 (LISTEN)
prometheu 8593 prometheus    4u  a_inode               0,13        0     10390 [eventpoll]
prometheu 8593 prometheus    5u     IPv6          719811990      0t0       TCP db1119.eqiad.wmnet:9104->prometheus1003.eqiad.wmnet:36304 (ESTABLISHED)
prometheu 8593 prometheus    6u     IPv6          719801884      0t0       TCP db1119.eqiad.wmnet:9104->prometheus1004.eqiad.wmnet:52084 (ESTABLISHED)

We're on PME version 0.11 (on buster). The latest upstream release is 0.13, so i had a look at the changelogs to see if there's anything relevant. The only thing i found is this: https://github.com/prometheus/mysqld_exporter/pull/323. It definitely seems like it might be in the right area, but it's hard to tell for sure.

Marostegui renamed this task from Investigate the root cause of prometheus-mysqld-exporter alert on db 1119 to Investigate the root cause of prometheus-mysqld-exporter alert on db1119.Oct 19 2021, 5:35 AM
Marostegui updated the task description. (Show Details)
Marostegui subscribed.

It is hard to troubleshoot this anymore as a month has passed by and we have no way to reproduce it (apart from creating more than 5 connections manually). So I am going to close this for now, if it happens again we can reopen and try to find more information.