Page MenuHomePhabricator

Reduce memory request for singleuser
Closed, ResolvedPublic

Description

In T345462 there was a partial outage where new servers could not be launched. This was due to all the memory in the worker nodes having been assigned. However the actual usage was rather lower than the requested usage.

At the time there were 87 single user containers running, of which the mean memory usage was 390Mi, and the median was 264Mi. Currently each server is given a 1Gi request and a 3Gi limit. At the time of T345462 only 2 of the 87 containers were using more than the 1Gi request (Using 1388Mi and 1395Mi). The memory usage is at about 50% (50%, 48% and 51% at the time). In T345462 the number of worker nodes was increased to resolve the issue.

Though 50% utilization is pretty low, is it reasonable to reduce the memory request (not limit) of the single user containers. In practice if the memory request were lower, no one would have observed any difference and the users who could not login, would have been able to. As there was plenty of unused memory about.

Of the 87 servers, 23 were using more than 500Mi and 13 were using more than 750Mi. Perhaps lowering the threshold to one of these figures is advisable, at least until the nodes get closer to 70% usage.

Event Timeline

I set this running over the weekend, so far 28 of 96 containers are at the .7G rather than 1G. Requests on the workers is still at about 75% somewhat higher than I would like. Over the next week or so as the containers restart we should see about a 13% reduction in requests (Another 17G or so should be reclaimed by the new policy), bringing us down into a low 60 usage. Which is about where I would like to see it running.

As it stands we're seeing about 2x the usage of PAWS. I would imagine that this is following wikimania. Investigating cluster auto scaling may be a reasonable thing to do in order to manage various expansions in use.

Mentioned in SAL (#wikimedia-cloud) [2023-09-05T12:19:16Z] <Rook> Reduce memory request for single user container T345467