Page MenuHomePhabricator

toolforge prometheus servers OOMing
Closed, ResolvedPublic

Description

The Toolforge Prometheus server has been crashing for the last day or so.

Event Timeline

taavi triaged this task as High priority.

The instances are using g3.cores8.ram36.disk20, so I'm a bit surprised they're running out of RAM.

Mentioned in SAL (#wikimedia-cloud) [2023-11-02T13:13:31Z] <taavi> wiping data directory from tools-prometheus-7 so we have least one working server T350227