Page MenuHomePhabricator

Bump memory of testreduce1002
Closed, ResolvedPublic

Description

It happens not so infrequently that testreduce1002 stalls for short or long periods of time when running rt-tests. I haven't investigated but I think this is because of processes thrashing which might be mitigated by adding some more RAM to this VM.

What is involved in bumping up RAM on this VM by another GB?

Event Timeline

Clement_Goubert subscribed.

It would seem you're right about memory pressure being an issue:

image.png (1×2 px, 250 KB)

The biggest RAM consumer is by far MariaDB with spikes up to almost 7GB, then parsoid-rt-client spiking to over 5GB. These spikes don't seem to line up together, but maybe we should add more than 1GB (currently 8GB), as MariaDB actually get OOMKilled fairly often (11 times in the last 24 hours).

It involves rebooting the VM, there's enough RAM on the host (and generally in the cluster). Tell us when we can do that without disruption
for you and we will.

Anytime today or tomorrow works. We'll hold off running rt-testing till the reboot happens.

Mentioned in SAL (#wikimedia-operations) [2025-05-13T15:56:21Z] <claime> gnt-instance modify -B memory=10g testreduce1002.eqiad.wmnet - T393904

VM testreduce1002.eqiad.wmnet rebooted by cgoubert@cumin1002 with reason: Pick up new 10GB ram

cgoubert@testreduce1002:~$ free -m
               total        used        free      shared  buff/cache   available
Mem:            9944        2211        7646           0         323        7733
Swap:            975           0         975

Done :)

cscott claimed this task.