Make a decommissioning checklist for toollabs VMs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	valhallasw
	May 2 2015, 4:41 PM

Description

A few things I could think of:

disable as exec host: qmod -d '*@node_name'
restart continuous jobs (qhost -j -h node_name|sed -e 's/^\s*//' | grep 'continuous' | cut -d ' ' -f 1 | xargs qmod -rj. For webgrid nodes you can just xargs to qdel instead - webservicemonitor will start them back up from service manifests shortly.
wait for other jobs to drain
unregister host as SGE exec host qconf -de $HOSTNAME
unregiste rhost from host group: qcond -mhgrp @default or qconf -mhgrp @webgrid
mark as planned down in shinken: http://shinken.wmflabs.org/host/$HOSTNAME note: not all hosts are in there?!
check for running non-SGE processes: ps hax -o user | sort | uniq -c | sort -n
delete host: https://wikitech.wikimedia.org/wiki/Special:NovaInstance
clear graphite metrics (needs access to graphite server, see http://geek.michaelgrace.org/2011/09/delete-data-from-graphite/ )
remove host from /data/project/.system/store
remove external hostname/IP from special:NovaAddress
(in some cases) remove rDNS registration in ops/dns
check if documentation still refers to this host and update

There should be no firewall rules to update, but it doesn't hurt to check,

Related Objects

Mentioned Here: T95537: Resetup tools-webgrid-04 due to /var being too small

Event Timeline

valhallasw created this task.May 2 2015, 4:41 PM

valhallasw raised the priority of this task from to Needs Triage.

valhallasw updated the task description. (Show Details)

valhallasw added a project: Toolforge.

valhallasw subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2015, 4:41 PM

valhallasw updated the task description. (Show Details)May 2 2015, 5:31 PM

valhallasw set Security to None.

valhallasw added a subscriber: yuvipanda.

Cf. T95537 what I did for tools-webgrid-04.

<dream>There should be a Phabricator template with all the check boxes, etc. in place. Then, in the "master" task that requires to decommission a host, you click "Create Subtask", "I want to use 'Decommission Tools exec host' template", "Template parameter 'host' = 'tools-exec-something'", "Do!".</dream>

valhallasw updated the task description. (Show Details)May 2 2015, 6:14 PM

Should go on https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Admin

A simple userscript should be able to do that, I think. See @Aklapper s work at https://github.com/wikimedia/wikimedia-bugzilla-triagescripts/blob/master/wikimedia-maniphest-task.user.js

yuvipanda updated the task description. (Show Details)May 2 2015, 6:58 PM

valhallasw triaged this task as Low priority.May 10 2015, 7:52 PM

valhallasw moved this task from Backlog to Ready to be worked on on the Toolforge board.May 10 2015, 8:43 PM

valhallasw updated the task description. (Show Details)Oct 4 2015, 7:45 PM

Restricted Application added a project: Cloud-Services. · View Herald TranscriptOct 4 2015, 7:45 PM

valhallasw renamed this task from Make a decommissioning checklist to Make a decommissioning checklist for toollabs VMs.Oct 4 2015, 7:48 PM

valhallasw updated the task description. (Show Details)Oct 4 2015, 7:56 PM

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:51 PM