Page MenuHomePhabricator

Investigate possible memory leak on db1115
Closed, ResolvedPublic

Assigned To
Authored By
Marostegui
Sep 2 2019, 5:45 AM
Referenced Files
F31906146: Captura de pantalla 2020-06-26 a las 6.54.52.png
Jun 26 2020, 4:56 AM
F31667108: host-overview (1).png
Mar 5 2020, 4:29 PM
F30221175: 1115.png
Sep 4 2019, 10:43 AM
F30221188: 1115-1.png
Sep 4 2019, 10:43 AM
F30103245: host-overview (1).png
Sep 2 2019, 5:45 AM
F27612064: db1115_memory.png
Sep 2 2019, 5:45 AM
F30198846: leak.png
Sep 2 2019, 5:45 AM

Description

Tendril seems to be having an excessive amount of memory consumption that keeps growing over the time until it makes OOM to kick in and start killing some processes, typically MySQL or nagios, causing pages

Examples of some tasks:
T231165: db1115 (tendril) paged twice in 24h due to OOM
T196726: db1115 (tendril DB) had OOM for some processes and some hw (memory) issues

Some graphs that show this behavior:

host-overview (1).png (500×1 px, 37 KB)

db1115_memory.png (500×1 px, 40 KB)

Tendril heavily uses the event scheduler with lots of events:

# mysql.py -hdb1115 tendril -e "show events" | wc -l
2267

MySQL is configured to use 20GB of pool_size.

innodb_buffer_pool_size        = 20G

As of now (1st Sept 05:32 AM UTC) this is the memory consumption:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29136 mysql     20   0  0.106t 0.062t  25624 S 244.4 50.3  41014:10 mysqld
root@db1115:/home/marostegui# free -g
              total        used        free      shared  buff/cache   available
Mem:            125          63           0           1          61          59
Swap:             0           0           0
# pmap 29136 | grep total
 total        113386168K
mysql:root@localhost [(none)]> show global status like 'Uptime';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Uptime        | 657836 |
+---------------+--------+
1 row in set (0.00 sec)

There are some (some of them quite old) bugs claiming to have some memory issues with events so maybe this could be similar:
https://bugs.mysql.com/bug.php?id=18683
https://forums.mysql.com/read.php?35,675519,675519
https://jira.mariadb.org/browse/MDEV-14260?attachmentViewMode=list

And this is the graph after the last crash T231165: db1115 (tendril) paged twice in 24h due to OOM

leak.png (500×1 px, 37 KB)

Event Timeline

Marostegui triaged this task as Medium priority.Sep 2 2019, 5:46 AM
Marostegui moved this task from Triage to Meta/Epic on the DBA board.

For now, I have left a cronjob on db1115 gathering a top output every 8h, so we can see how much the memory grows specifically for the MySQL process.
I am also planning to restart MySQL tomorrow without the event_scheduler for a few minutes, and later enable it to see how much it grows.

Mentioned in SAL (#wikimedia-operations) [2019-09-04T10:11:35Z] <marostegui> Tendril/dbtree will be unavailable for a few minutes T231769

Mentioned in SAL (#wikimedia-operations) [2019-09-04T10:12:40Z] <marostegui> Stop MySQL on db1115 without the event scheduler - T231769

Mentioned in SAL (#wikimedia-operations) [2019-09-04T10:20:37Z] <marostegui> Start MySQL on db1115 without the event scheduler - T231769

Mentioned in SAL (#wikimedia-operations) [2019-09-04T10:41:52Z] <marostegui> Start event scheduler on db1115 T231769

These were the figures before I stopped mysql over the last 2 (I gather data every 8 hours, so 3 times a day) days - we can see MySQL memory growing every day:

29136 mysql     20   0  0.107t 0.063t  25624 S  1359 51.1  45283:05 mysqld
29136 mysql     20   0  0.107t 0.062t  25624 S 672.2 50.1  47047:23 mysqld
29136 mysql     20   0  0.107t 0.062t  25624 S 258.8 50.5  48856:46 mysqld
29136 mysql     20   0  0.108t 0.064t  25624 S 227.8 51.9  50679:09 mysqld
29136 mysql     20   0  0.108t 0.064t  25624 S 658.8 51.8  52411:51 mysqld
29136 mysql     20   0  0.108t 0.064t  25624 S 133.3 51.9  52818:32 mysqld

I have restarted mysql on db1115 without the event scheduler enabled.
Waited until the buffer pool was fully loaded and this is the memory consumption at the time

Sep 04 10:20:31 db1115 mysqld[1630]: 2019-09-04 10:20:31 7f1f36ff7700 InnoDB: Buffer pool(s) load completed at 190904 10:20:31

After that the memory values for the mysqld process kept growing from 17.5% (10:20:31) to 19.9% (10:40:06) when I started the event scheduler.

1115.png (500×1 px, 24 KB)

At that point the memory usage jumped from 19.9% to 21.3% (jumped almost 10G of VIRT) in a second (and still growing)
Visible on the graph as well:

1115-1.png (500×1 px, 26 KB)

Current status:

1630 mysql     20   0 49.415g 0.027t  24936 S 476.5 22.0  13:53.11 mysqld

I would like to leave the event_scheduler stopped for longer in a future test to see when the memory usage stops.

12:27:04

1630 mysql     20   0 67.313g 0.046t  24936 S 366.7 37.8 394:46.56 mysqld

So almost 20GB more in less than 2 hours after enabling event_scheduler

I just realised we do have enabled TokuDB there, with a pretty big buffer configured 30GB.
We don't have many TokuDB tables (as most of them are FEDERATED):

+-----------------------+--------+---------------------+
| table_name            | engine | update_time         |
+-----------------------+--------+---------------------+
| client_statistics_log | TokuDB | 2019-09-05 13:31:46 |
| general_log_sampled   | TokuDB | 2019-09-05 00:00:21 |
| global_status_log     | TokuDB | 2019-09-05 13:35:46 |
| global_status_log_5m  | TokuDB | 2019-09-05 13:35:50 |
| index_statistics_log  | TokuDB | 2019-09-05 13:34:47 |
| innodb_locks_log      | TokuDB | 2019-09-05 13:35:45 |
| innodb_trx_log        | TokuDB | 2019-09-05 13:35:45 |
| queries               | TokuDB | 2018-02-22 08:36:23 |
| queries_seen_log      | TokuDB | 2019-09-05 00:00:24 |
| query_footprint_log   | TokuDB | 2014-05-30 02:44:55 |
| slave_status_log      | TokuDB | 2019-09-05 13:36:40 |
| slow_log_sampled      | TokuDB | 2019-09-05 00:00:46 |
| table_statistics_log  | TokuDB | 2019-09-05 13:34:44 |
| user_statistics_log   | TokuDB | 2019-09-05 13:34:44 |
+-----------------------+--------+---------------------+
14 rows in set, 1 warning (4 min 56.98 sec)

@jcrespo do you have any idea if those tables are toku for any specific reason? (apart from compression?)

Those are the main tables where the historic metrics are stored- they are actually quite large. Most should disappear, but we cannot just convert them as they compress quite efficiently well compared to InnoDB compressed.

It has grown 50GB in 2 days. Right now we are at 90GB, and it increases around 2GB every 8 hours.
TokuDB buffer + Innodb buffer are 50GB, nowhere near those 90GB (and still increasing).

On Monday I am going to stop the event_scheduler again (without restarting mysql) for a few hours to check how the memory behaves and if it stops growing.

Based on trending, it will start swapping around 18 december if not restarted.

Based on trending, it will start swapping around 18 december if not restarted.

I will restart it on Monday 18th - thanks for rechecking this :)

Mentioned in SAL (#wikimedia-operations) [2019-11-18T06:30:43Z] <marostegui> Restart tendril mysql - T231769

MySQL had to be restarted as tendril stopped updating its data.
Memory activity

host-overview (1).png (500×1 px, 33 KB)

At the time of the memory and connections drop, the logs show so it could be the usual memory issues:

Mar  5 15:15:17 db1115 puppet-agent[14295]: (/Stage[main]/Exim4/Service[exim4]) Could not evaluate: Cannot allocate memory - fork(2)
Mar  5 15:15:17 db1115 puppet-agent[14295]: (/Stage[main]/Ulogd/Service[ulogd2]) Could not evaluate: Cannot allocate memory - fork(2)

Mentioned in SAL (#wikimedia-operations) [2020-05-05T04:39:35Z] <marostegui> Restart mysql on tendril host: db1115 - T231769

This seems very similar to what we see: https://jira.percona.com/browse/PS-6961 (also affects 5.6, 5.7). I will try to run the same procedure they describe there and see what we get with tendril.

Change 596189 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596189 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 merged by Jcrespo:
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596242 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596242 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596251 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

Change 596251 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

This seems very similar to what we see: https://jira.percona.com/browse/PS-6961 (also affects 5.6, 5.7). I will try to run the same procedure they describe there and see what we get with tendril.

So far I have not been able to reproduce this on 10.4. I have noticed a slim increase on memory but around 100MB, which probably irrelevant. We'll see after the weekend.

This seems very similar to what we see: https://jira.percona.com/browse/PS-6961 (also affects 5.6, 5.7). I will try to run the same procedure they describe there and see what we get with tendril.

So far I have not been able to reproduce this on 10.4. I have noticed a slim increase on memory but around 100MB, which probably irrelevant. We'll see after the weekend.

So from Friday to Monday the numbers have grown from:

VIRT and RES mem

407.2g  68.4g

to

410.6g  71.7g

Keeping in mind that mysql is doing nothing apart from running those events, that increase does show something.

After 4 days since I stopped the event scheduler, the memory never grew. I have filed this upstream: https://jira.mariadb.org/browse/MDEV-22809

The "usual" OOM happened last night:

Captura de pantalla 2020-06-26 a las 6.54.52.png (760×1 px, 95 KB)

Jun 25 20:46:17 db1115 kernel: [4014818.142520] mysqld invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0-1, order=0, oom_score_adj=-600
Jun 25 20:46:17 db1115 kernel: [4014818.142523] mysqld cpuset=/ mems_allowed=0-1
Jun 25 20:46:17 db1115 kernel: [4014818.142529] CPU: 0 PID: 24879 Comm: mysqld Not tainted 4.9.0-12-amd64 #1 Debian 4.9.210-1
Jun 25 20:46:17 db1115 kernel: [4014818.142530] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.5.4 08/17/2017
Jun 25 20:46:17 db1115 kernel: [4014818.142533]  0000000000000000 ffffffffbd736bfe ffffad97617c7cf0 ffff981936ff40c0
Jun 25 20:46:17 db1115 kernel: [4014818.142537]  ffffffffbd609adb 0000000000000000 00000000fffffda8 0000000cbd4ea300
Jun 25 20:46:17 db1115 kernel: [4014818.142541]  ffff9819a0da7000 ffffffffbd58d587 0000004200000000 ffffffffbd6ad670
Jun 25 20:46:17 db1115 kernel: [4014818.142544] Call Trace:
Jun 25 20:46:17 db1115 kernel: [4014818.142554]  [<ffffffffbd736bfe>] ? dump_stack+0x66/0x88
Jun 25 20:46:17 db1115 kernel: [4014818.142559]  [<ffffffffbd609adb>] ? dump_header+0x78/0x1fd
Jun 25 20:46:17 db1115 kernel: [4014818.142566]  [<ffffffffbd58d587>] ? get_page_from_freelist+0x3f7/0xb20
Jun 25 20:46:17 db1115 kernel: [4014818.142572]  [<ffffffffbd6ad670>] ? cap_inode_killpriv+0x20/0x20
Jun 25 20:46:17 db1115 kernel: [4014818.142574]  [<ffffffffbd5894da>] ? oom_kill_process+0x22a/0x3f0
Jun 25 20:46:17 db1115 kernel: [4014818.142577]  [<ffffffffbd589971>] ? out_of_memory+0x111/0x470
Jun 25 20:46:17 db1115 kernel: [4014818.142580]  [<ffffffffbd58ea6f>] ? __alloc_pages_slowpath+0xa1f/0xb30
Jun 25 20:46:17 db1115 kernel: [4014818.142606] Mem-Info:
Jun 25 20:46:17 db1115 kernel: [4014818.142614] active_anon:30211232 inactive_anon:1886330 isolated_anon:0
Jun 25 20:46:17 db1115 kernel: [4014818.142614]  active_file:642 inactive_file:307 isolated_file:0
Jun 25 20:46:17 db1115 kernel: [4014818.142614]  unevictable:0 dirty:0 writeback:153 unstable:0
Jun 25 20:46:17 db1115 kernel: [4014818.142614]  slab_reclaimable:61891 slab_unreclaimable:53143
Jun 25 20:46:17 db1115 kernel: [4014818.142614]  mapped:44514 shmem:326114 pagetables:199832 bounce:0
Jun 25 20:46:17 db1115 kernel: [4014818.142614]  free:88754 free_pcp:3 free_cma:0
Jun 25 20:46:17 db1115 kernel: [4014818.142748] 346024 total pagecache pages
Jun 25 20:46:17 db1115 kernel: [4014818.142749] 18594 pages in swap cache
Jun 25 20:46:17 db1115 kernel: [4014818.142751] Swap cache stats: add 1044841, delete 1026247, find 8474804/8591572
Jun 25 20:46:17 db1115 kernel: [4014818.142752] Free swap  = 0kB
Jun 25 20:46:17 db1115 kernel: [4014818.142753] Total swap = 976316kB
Jun 25 20:46:17 db1115 kernel: [4014818.142754] 33530456 pages RAM
Jun 25 20:46:17 db1115 kernel: [4014818.142755] 0 pages HighMem/MovableOnly
Jun 25 20:46:17 db1115 kernel: [4014818.142756] 544789 pages reserved
Jun 25 20:46:17 db1115 kernel: [4014818.142756] 0 pages hwpoisoned
Jun 25 20:46:17 db1115 kernel: [4014818.142757] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 25 20:46:17 db1115 kernel: [4014818.142767] [  544]     0   544   132295    78697     257       3        0             0 systemd-journal
Jun 25 20:46:17 db1115 kernel: [4014818.142770] [  561]     0   561    11736      541      23       3        0         -1000 systemd-udevd
Jun 25 20:46:17 db1115 kernel: [4014818.142773] [  695]     0   695     1784       63       9       3       16             0 mdadm
Jun 25 20:46:17 db1115 kernel: [4014818.142776] [  742]   100   742    31821       32      31       3      101             0 systemd-timesyn
Jun 25 20:46:17 db1115 kernel: [4014818.142779] [  803]     0   803     7416       74      19       3        0             0 cron
Jun 25 20:46:17 db1115 kernel: [4014818.142781] [  807]   113   807    14597       60      34       3       42             0 ulogd
Jun 25 20:46:17 db1115 kernel: [4014818.142784] [  817]   107   817    11295      161      27       3        0          -900 dbus-daemon
Jun 25 20:46:17 db1115 kernel: [4014818.142787] [  824]     0   824    11618      149      27       3        0             0 systemd-logind
Jun 25 20:46:17 db1115 kernel: [4014818.142789] [  832]     0   832     6288      235      18       3        0             0 smartd
Jun 25 20:46:17 db1115 kernel: [4014818.142792] [  833]     0   833   306118     1280      95       4       30             0 rsyslogd
Jun 25 20:46:17 db1115 kernel: [4014818.142794] [  836]   112   836   378746     4741      79       7      108             0 prometheus-mysq
Jun 25 20:46:17 db1115 kernel: [4014818.142797] [  837]   112   837    34705      894      38       3     2025             0 python3
Jun 25 20:46:17 db1115 kernel: [4014818.142799] [  844]   112   844   824641     7257     152       7      263             0 prometheus-node
Jun 25 20:46:17 db1115 kernel: [4014818.142802] [  847]     0   847    14587       50      30       3      118             0 lldpd
Jun 25 20:46:17 db1115 kernel: [4014818.142805] [  874]     0   874     3575        0      11       3       33             0 agetty
Jun 25 20:46:17 db1115 kernel: [4014818.142807] [  879]     0   879     3183        3      11       3       34             0 mcelog
Jun 25 20:46:17 db1115 kernel: [4014818.142811] [  883]   108   883    24719    10171      49       3      117             0 lldpd
Jun 25 20:46:17 db1115 kernel: [4014818.142813] [  897]     0   897    17489       31      39       3      161         -1000 sshd
Jun 25 20:46:17 db1115 kernel: [4014818.142816] [  922]     0   922     5981      121      17       3       10          -500 nrpe
Jun 25 20:46:17 db1115 kernel: [4014818.142818] [  923]     0   923     3631        0      12       3       37             0 agetty
Jun 25 20:46:17 db1115 kernel: [4014818.142821] [ 2643]   499  2643 104544996 31705813  198452     401   228829          -600 mysqld
Jun 25 20:46:17 db1115 kernel: [4014818.142824] [ 5517]   106  5517    14037      154      28       3       10             0 exim4
Jun 25 20:46:17 db1115 kernel: [4014818.142826] [  465]     0   465   715668     2288     120       8        0             0 prometheus-rsys
Jun 25 20:46:17 db1115 kernel: [4014818.142832] [23757]     0 23757    12206      105      29       3        0             0 cron
Jun 25 20:46:17 db1115 kernel: [4014818.142835] [23927]     0 23927     1069       16       7       3        0             0 sh
Jun 25 20:46:17 db1115 kernel: [4014818.142838] [23928]     0 23928    15846     2943      36       3        0             0 smart-data-dump
Jun 25 20:46:17 db1115 kernel: [4014818.142840] [24287]     0 24287     2522       21      10       3        0             0 timeout
Jun 25 20:46:17 db1115 kernel: [4014818.142842] [24288]     0 24288    76254     9515     139       3        0             0 facter

Mentioned in SAL (#wikimedia-operations) [2020-09-09T13:57:13Z] <marostegui> Restart mysql on db1115 T231769

Marostegui claimed this task.

I am going to close this, there's not much else we can do here, the bug is filed at https://jira.mariadb.org/browse/MDEV-22809 which indeed shows some leaks.
My bet is that they are coming from the massive amount of events we have.
Tendril will be deprecated at some point
Closing this