As per parent task, Prometheus k8s in codfw/eqiad can get OOM killed from time to time.
As one of the short term mitigations I think we should try to scale the hw up vertically, namely by increasing memory.
The current R440 hosts have 4x32GB RAM each, @wiki_willy would you mind helping in looking if we have memory available on site to be installed? Ideally >= 64GB per host (total 4 hosts, 2 eqiad and 2 codfw).
And if not immediately available, could we order the memory?
thank you!