Page MenuHomePhabricator

mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking
Closed, DeclinedPublic

Description

dbstore1004 seems to be having an excessive amount of memory consumption that keeps growing over the time until it is restarted. It has been done at least twice. Thanks to memory monitoring, this is under control, but it would be nice to understand if there is a clear underlying issue causing them.

Screenshot_20201214_173325.png (885×2 px, 102 KB)

https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=4&orgId=1&refresh=5m&var-server=dbstore1004&var-datasource=thanos&var-cluster=mysql&from=1592152359067&to=1607963559067

Screenshot_20201214_173721.png (808×1 px, 83 KB)

https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=12&orgId=1&from=1607942246916&to=1607963846917&var-site=eqiad&var-group=core&var-shard=All&var-role=All

Event Timeline

This is not a huge concern since we have memory monitoring T172490, but adding it here for tracking, so we can research at a later time if possible. Scheduling a restart for now.

Mentioned in SAL (#wikimedia-operations) [2020-12-16T11:10:09Z] <jynus> stopping and restarting dbstore1004 to mitigate (short term) T270112

dbstore1004 again at 90% memory usage.

Change 673849 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Reduce buffer pool memory for dbstore1004's mariadb instances

https://gerrit.wikimedia.org/r/673849

Change 673849 merged by Elukey:
[operations/puppet@production] Reduce buffer pool memory for dbstore1004's mariadb instances

https://gerrit.wikimedia.org/r/673849

Restarted all instances on dbstore1004 today :(

dbstore1004 is no more, and for dbstore1007 we have T290841: dbstore1007 is swapping heavilly, potentially soon killing mysql services due to OOM error
dbstore1003 and dbstore1005 look fine, closing this.