Page MenuHomePhabricator

Migrate deployment-prep to new labvirt hosts
Closed, ResolvedPublic

Description

I'm going to run a big scripted migrate of all labs instances sometime soon. However, I suspect that beta folks will appreciate having a more narrow window than 'sometime in the next two weeks' so I'd like to migrate it first. That will reduce the window down to about a day.

What will happen:

Every twenty minutes, one beta VM will be copied from an old compute node to a new one. At the end of the copy, the instance will be temporarily suspended (for ~ one minite) while execution is moved to the new host. The instance will not be rebooted, and services should not be interrupted but some actions may time out during the suspension.

Instances will move in alphabetical order. There are about 50 instances which means the whole process will take around 17 hours.

Live migration isn't a perfect process, so there may be stragglers that I have to shut down and move by hand, but I'll notify folks before that happens if necessary.

Event Timeline

Andrew claimed this task.
Andrew raised the priority of this task from to Needs Triage.
Andrew updated the task description. (Show Details)
Andrew subscribed.

I guess it is fine. There will be surely some side effects on the beta cluster but if we announce it to the engineering and qa lists people would know it is going to be flapping for a day.

We have browser tests spread over the day, shinken monitoring and logstash. So if there are some big issues we will likely catch them up quite fast.

What will happen if a VM ends up being corrupted? Most/all are easily rebuildable but deployment-bastion might be a challenge. You might consider making a backup of it before attempting its migration.

I've done about 20 instances with no corruption -- the worst case is that the instance just doesn't copy. I can backup that instance but I'd need to halt it first -- would you like me to do that?