Move beta cluster ORES to its own machine
Closed, ResolvedPublic

Description

We've been noticing beta ORES crashing for a few months now, and it's clear that it's hitting the memory ceiling. We're sharing the machine with some nodejs stuff and the machine's total available memory fluctuates.

It would be better if we could move our service to a dedicated beta cluster node, with 8GB memory and 2CPU.

awight created this task.Jan 5 2018, 3:21 PM
Halfak added a comment.Jan 5 2018, 3:37 PM

FWIW, our staging machine for our CloudVPS install for ORES is 16GB and usually runs with 9.2GB free. It has 8 celery workers and 48 uwsgi workers.

I was working on it and tried to make an instance in deployment-prep so I thought an admin but I'm not in the list: https://tools.wmflabs.org/openstack-browser/project/deployment-prep It is very weird because I remember making an instance (deployment-sca03), if it's taken away, I'm not notified or missed the message (which is very unlikely).

awight added a comment.Jan 8 2018, 1:42 PM

It seems we have an ORES Redis node in deployment-prep, which is unnecessary. That role should be fulfilled by the new, dedicated ORES box.

Once Beta ORES is migrated, we can recycle the Redis machine: deployment-ores-redis-01.deployment-prep.eqiad.wmflabs

Change 403103 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/services/ores/deploy@master] Migrate ores in labs to the dedicated node

https://gerrit.wikimedia.org/r/403103

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJan 9 2018, 5:36 AM

Change 403103 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Migrate ores in labs to the dedicated node

https://gerrit.wikimedia.org/r/403103

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:42:56Z] <Amir1> deleted deployment-ores-redis-01 in favor of deployment-ores01 (T184282)

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:48:12Z] <Amir1> stopping ores services in deployment-sca03 (T184282)

Mentioned in SAL (#wikimedia-releng) [2018-01-09T08:49:55Z] <Amir1> ladsgroup@deployment-sca03:/srv/deployment$ sudo rm -rf ores (T184282)

Ladsgroup moved this task from Incoming to Done on the User-Ladsgroup board.Jan 12 2018, 12:15 PM
Halfak closed this task as Resolved.Jan 30 2018, 8:31 PM
238482n375 set Security to Software security bug.Jun 15 2018, 8:04 AM
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
This comment was removed by Reedy.
Restricted Application added a project: Security. · View Herald TranscriptJun 15 2018, 1:38 PM
Halfak changed the visibility from "Custom Policy" to "Public (No Login Required)".
Halfak removed a subscriber: 238482n375.
Reedy added a subscriber: Reedy.Jun 15 2018, 2:23 PM