Page MenuHomePhabricator

Labs: Move tools-shadow off the same host as tool-master
Closed, ResolvedPublic

Description

As a first step, to reduce the risk in case the one host goes down.

Event Timeline

coren claimed this task.
coren raised the priority of this task from to High.
coren updated the task description. (Show Details)
coren added projects: Toolforge, Labs-Sprint-103.
coren added subscribers: yuvipanda, scfc, coren and 2 others.

Testing how gridengine fares if the shadow and master are not matching version in the versionmismatch project.

Gridengine does not seem to suffer from the version mismatch (6.2u5-4 vs 6.2u5-7.3) and the configuration ports without difficulty.

Therefore, the plan:

  1. rebuild tools-shadow (precise) as tools-shadow-01 (trusty) in Tools, allow it to configure and stabilize, then switch masters to it.
  2. After a period of test, we can then remove tools-shadow, create tools-master-01
  3. Switch to tools-master-01
  4. Remove tools-master when all is demonstrated well.

If the roles of master and shadow are basically identical and they auto-discover who's in charge, couldn't we name them tools-master-01, tools-master-02, etc. like other services?

There is a subtle difference, at least structurally, in that the gridengine configuration itself makes the distinction. That is, while the shadows can take over the master role, there is one designated server that does not run the monitoring daemon and which is considered the canonical master.

For tools-redis-01 & Co., @yuvipanda used a scheme where the determination of master and server is done by setting $active_redis accordingly. So if something like this is possible with our gridengine setup, I think that would be very useful. However, that's not a blocker for this task.

Do remember to cleanup that project / delete when done :)f

Now on labvirt1008

I've just checked and tools-shadow is on labvirt1008 and tools-master is on labvirt1004.

@yuvipanda: It's not clear to me why the test is failing - do you have any insight?

coren removed coren as the assignee of this task.Nov 16 2015, 6:15 PM

@yuvipanda: It looks like the test is broken, rather than host distribution.

@yuvipanda: It looks like the test is broken, rather than host distribution.

Does somebody plan to fix the test?

yuvipanda assigned this task to Andrew.

I think Andrew fixed it, I see that the test is green now.