Page MenuHomePhabricator

Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role
Open, MediumPublic

Description

In the production analytics cluster we have two coordinators, one is active and the other acts as a standby.

In the test cluster we only have a single coordinator server, so we cannot effectively hone or test any failover procedures in the test environment.

Deploying a second coordinator server in the test cluster will help us to improve and test this functionality.

Event Timeline

+1

We might need to also make an-test-launcher1001, as an-test-coord1001 is currently serving the role of both an-coord1001 and an-launcher1002.

Ottomata triaged this task as Medium priority.
Ottomata moved this task from Backlog to Q1 2021/2022 on the Analytics-Clusters board.

Change 714753 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add replica hadoop coordinator role in the test cluster

https://gerrit.wikimedia.org/r/714753

BTullis renamed this task from Deploy an-test-coord1002 as a Ganeti VM to facilitate failover testing of analytics coordinator role to Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role.Wed, Sep 15, 10:32 AM
BTullis moved this task from In Progress to Paused on the Analytics-Kanban board.
BTullis added a subtask: Unknown Object (Task).

I'm awaiting the decision from this server request: {T289784} to determine whether or not to press ahead with or roll back the changes in T289664: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role.