Page MenuHomePhabricator

Move ORES to redis misc cluster
Closed, ResolvedPublic

Description

Intro

We want to move from oresrdb hosts to redis::misc hosts to lower
operational costs. We will stop maintaining separate redis databases and
better utilize the under utilized redis::misc databases

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595167/ will achieve that but will require some one time orchestration for this.

Process in pseudocode

  1. disable puppet on ores1* and ores2*
  2. set in all hosts /etc/hosts to have oresrdb.svc.eqiad.wmnet point to the old oresrdb host
  3. merge puppet change https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595167/
  4. merge puppet/private puppet change for the password
  5. merge DNS change https://gerrit.wikimedia.org/r/#/c/operations/dns/+/601665
  6. Run the equivalent of the following in cumin commands
 for host in hosts
        depool host 
        delete /etc/hosts line
        enable puppet
        run puppet
        restart celery
        restart uwsgi
        pool
done

Event Timeline

Change 601665 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Switch oresrdb.svc records to redis::misc

https://gerrit.wikimedia.org/r/601665

Change 595167 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ores: Parameterize redis ports

https://gerrit.wikimedia.org/r/595167

Change 601665 merged by Alexandros Kosiaris:
[operations/dns@master] Switch oresrdb.svc records to redis::misc

https://gerrit.wikimedia.org/r/601665

Change 595167 merged by Alexandros Kosiaris:
[operations/puppet@production] ores: Parameterize redis ports

https://gerrit.wikimedia.org/r/595167

Mentioned in SAL (#wikimedia-operations) [2020-06-02T10:09:25Z] <akosiaris> switch over ores2XXX hosts to redis::misc from oresrdb hosts. T254226

codfw migration has gone really well, I 've barely managed to notice the migration in the dashboards.

Mentioned in SAL (#wikimedia-operations) [2020-06-02T10:29:10Z] <akosiaris> switch over ores1XXX hosts to redis::misc from oresrdb hosts. T254226

eqiad migrations has gone pretty much as well. There seem to be some occasional overloads due to ores1001 at some point, it looks like a restart of uwsgi+celery fixed it.

akosiaris changed the task status from Open to Stalled.Jun 2 2020, 11:17 AM

https://grafana.wikimedia.org/d/RLhtAw6mz/ores-redis?orgId=1&refresh=1m has been updated as well. I am gonna call this resolved, but we should wait a couple of days before calling it a full success. Setting to stalled for 2 days.

akosiaris claimed this task.

Everything is fine after a week, resolving this.