The 3 hosts triggered a warning on icinga approx 25/26 days ago.
The memory usage seems to have grown linearly over the last 30 days while other hosts remain constant:
https://grafana.wikimedia.org/goto/LCahmFBHR?orgId=1
The 3 hosts triggered a warning on icinga approx 25/26 days ago.
The memory usage seems to have grown linearly over the last 30 days while other hosts remain constant:
https://grafana.wikimedia.org/goto/LCahmFBHR?orgId=1
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Marostegui | T395294 High MariaDB memory usage on es1035, es2038 and es2039 | |||
| Declined | None | T395545 Switchover es7 master (es2038 -> es2039) | |||
| Resolved | FCeratto-WMF | T395544 Switchover es7 master (es1035 -> es1039) |
All the es* need update as part of T395241 so that task will solve this one.
As discussed on IRC I suspect a memory leak (perhaps related to connections being restarted?) . Maybe we could consider lowering the threshold for memory usage warning and also introduce an alert on IRC
This became a CRIT, please restart them proactively - do not wait for the other task
[15:13:05] <+icinga-wm> PROBLEM - MariaDB memory on es1035 is CRITICAL: CRIT Memory 95% used. Largest process: mysqld (1614) = 93.5% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
Mentioned in SAL (#wikimedia-operations) [2025-05-29T10:07:06Z] <fceratto@cumin1002> dbctl commit (dc=all): 'Depool es2039 T395294', diff saved to https://phabricator.wikimedia.org/P76665 and previous config saved to /var/cache/conftool/dbconfig/20250529-100704-fceratto.json
es2039 is pooling it to alleviate load on other hosts (see T395551), icinga dowtime has been removed.
es2038 was rebooted during T395551, only es1035 is left for the switchover followed by depool and upgrade