Page MenuHomePhabricator

Move beta cluster ORES to its own machine
Closed, ResolvedPublic

Description

We've been noticing beta ORES crashing for a few months now, and it's clear that it's hitting the memory ceiling. We're sharing the machine with some nodejs stuff and the machine's total available memory fluctuates.

It would be better if we could move our service to a dedicated beta cluster node, with 8GB memory and 2CPU.

Event Timeline

FWIW, our staging machine for our CloudVPS install for ORES is 16GB and usually runs with 9.2GB free. It has 8 celery workers and 48 uwsgi workers.

I was working on it and tried to make an instance in deployment-prep so I thought an admin but I'm not in the list: https://tools.wmflabs.org/openstack-browser/project/deployment-prep It is very weird because I remember making an instance (deployment-sca03), if it's taken away, I'm not notified or missed the message (which is very unlikely).

It seems we have an ORES Redis node in deployment-prep, which is unnecessary. That role should be fulfilled by the new, dedicated ORES box.

Once Beta ORES is migrated, we can recycle the Redis machine: deployment-ores-redis-01.deployment-prep.eqiad.wmflabs

Change 403103 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/services/ores/deploy@master] Migrate ores in labs to the dedicated node

https://gerrit.wikimedia.org/r/403103

Change 403103 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Migrate ores in labs to the dedicated node

https://gerrit.wikimedia.org/r/403103

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:42:56Z] <Amir1> deleted deployment-ores-redis-01 in favor of deployment-ores01 (T184282)

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:48:12Z] <Amir1> stopping ores services in deployment-sca03 (T184282)

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:49:55Z] <Amir1> ladsgroup@deployment-sca03:/srv/deployment$ sudo rm -rf ores (T184282)

238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
This comment was removed by Reedy.
Halfak changed the visibility from "Custom Policy" to "Public (No Login Required)".
Halfak removed a subscriber: 238482n375.