Page MenuHomePhabricator

Labs: Move tools-shadow off the same host as tool-master
Closed, ResolvedPublic

Description

As a first step, to reduce the risk in case the one host goes down.

Event Timeline

coren created this task.Jun 22 2015, 5:52 PM
coren claimed this task.
coren raised the priority of this task from to High.
coren updated the task description. (Show Details)
coren added projects: Toolforge, Labs-Sprint-103.
coren added subscribers: yuvipanda, scfc, coren and 2 others.
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 22 2015, 5:52 PM
coren moved this task from To Do to Doing on the Labs-Sprint-103 board.Jun 22 2015, 5:58 PM

Testing how gridengine fares if the shadow and master are not matching version in the versionmismatch project.

coren added a comment.Jun 23 2015, 7:08 PM

Gridengine does not seem to suffer from the version mismatch (6.2u5-4 vs 6.2u5-7.3) and the configuration ports without difficulty.

Therefore, the plan:

  1. rebuild tools-shadow (precise) as tools-shadow-01 (trusty) in Tools, allow it to configure and stabilize, then switch masters to it.
  2. After a period of test, we can then remove tools-shadow, create tools-master-01
  3. Switch to tools-master-01
  4. Remove tools-master when all is demonstrated well.
Sitic added a subscriber: Sitic.Jun 23 2015, 8:13 PM
scfc added a comment.Jun 23 2015, 9:18 PM

If the roles of master and shadow are basically identical and they auto-discover who's in charge, couldn't we name them tools-master-01, tools-master-02, etc. like other services?

coren added a comment.Jun 23 2015, 9:30 PM

There is a subtle difference, at least structurally, in that the gridengine configuration itself makes the distinction. That is, while the shadows can take over the master role, there is one designated server that does not run the monitoring daemon and which is considered the canonical master.

scfc added a comment.Jun 23 2015, 10:17 PM

For tools-redis-01 & Co., @yuvipanda used a scheme where the determination of master and server is done by setting $active_redis accordingly. So if something like this is possible with our gridengine setup, I think that would be very useful. However, that's not a blocker for this task.

Do remember to cleanup that project / delete when done :)f

valhallasw moved this task from Triage to Backlog on the Toolforge board.Jul 2 2015, 7:15 PM
coren closed this task as Resolved.Sep 28 2015, 4:29 PM

Now on labvirt1008

coren moved this task from To do to Done on the Labs-Sprint-115 board.Sep 28 2015, 4:29 PM
coren added a comment.Oct 14 2015, 2:41 PM

I've just checked and tools-shadow is on labvirt1008 and tools-master is on labvirt1004.

@yuvipanda: It's not clear to me why the test is failing - do you have any insight?

coren removed coren as the assignee of this task.Nov 16 2015, 6:15 PM

@yuvipanda: It looks like the test is broken, rather than host distribution.

@yuvipanda: It looks like the test is broken, rather than host distribution.

Does somebody plan to fix the test?

yuvipanda closed this task as Resolved.Jul 5 2016, 1:40 PM
yuvipanda assigned this task to Andrew.

I think Andrew fixed it, I see that the test is green now.