Page MenuHomePhabricator

Investigate mw-on-k8s statsd-exporter RAM usage pattern
Closed, InvalidPublic

Description

statsd-exporter ramps RAM consumption up over around a week then gets OOMKilled.

image.png (263×705 px, 23 KB)

Figure out if this is overload or a memory leak:

  • Add more replicas, check usage pattern in 10-15 days
  • If no change, raise memory limit, check usage pattern in 10-15 days
  • If no change, investigate memory leak

Event Timeline

Change #1243846 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] statsd-exporter: Add 2 replicas for mw-web

https://gerrit.wikimedia.org/r/1243846

Since this behaviour was apparent on multiple mw-on-k8s deployments, and disappeared with more replicas, it's very likely that it's load related and not an actual memory leak. This should be confirmed if we get either longer time to oomkill or the behaviour disappears with more replicas. If the behaviour disappears, no more changes should be needed. If the time to oomkill is longer, we should raise the memory limits until the asymptote reached by the RAM usage curves doesn't hit the limit.

The number of metrics statsd-exporter is reporting seems to be increasing similarly over the lifetime of the container:

statsd_exporter_metrics_increasing.png (928×2 px, 489 KB)

Change #1243846 merged by jenkins-bot:

[operations/deployment-charts@master] statsd-exporter: Add 2 replicas for mw-web

https://gerrit.wikimedia.org/r/1243846

Clement_Goubert moved this task from Inbox to In Progress on the ServiceOps new board.

This appears to be a duplicate of T410152.