Page MenuHomePhabricator

upgrade people.wikimedia.org to stretch (replace rutherfordium with people1001)
Closed, ResolvedPublic

Description

rutherfordium is people.wikimedia.org and running on jessie.

[rutherfordium:~] $ lsb_release -c
Codename: jessie

Upgrade or replace it with stretch.

Create a new VM with stretch and migrate the service over, then remove the current VM.


after some IRC talk: rutherfordium will be replaced by people1001 (added to Wikitech server name conventions page)

I am including the vm-request in this ticket as well, pasting below:


VM request for replacement machine for rutherfordium:

Labs Project Tested: n/a
Site/Location: EQIAD (there is no Ganeti in codfw or i would want one in each DC)
Number of systems: 1
Service: people.wikimedia.org
Networking Requirements: internal IP, behind caching servers
Processor Requirements: 1
Memory: 2G
Disks: 80GB
Other Requirements: none

decom steps for rutherfordium (previous ganeti VM):

  • sync /home data one last time to new VM
  • remove puppet role and replace with role spare to remove all user accounts
  • remove from site.pp completely
  • removed from DHCP,netinstall/partman
  • on puppetmaster: sudo puppet cert revoke rutherfordium.eqiad.wmnet ; sudo puppet node clean rutherfordium.eqiad.wmnet ; sudo puppet node deactivate rutherfordium.eqiad.wmnet ;
  • check host removed from icinga after steps above and running puppet on icinga server
  • shut down VM
  • delete VM with gnt-instance remove on ganeti server
  • removed production IP from DNS (no mgmt for VMs)
  • sent mail to ops list to inform about it
  • updated wikitech pages for server and service with new host names and fingerprints

Event Timeline

I am suggesting we create a new VM called people1001 , copy data over and then delete rutherfordium

any concerns? should i keep using element names ?

Dzahn triaged this task as Medium priority.Nov 21 2018, 1:14 AM
Dzahn added a subscriber: RobH.

Change 475033 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: add stretch/PHP7 support

https://gerrit.wikimedia.org/r/475033

Change 475123 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add people1001.eqiad.wmnet to replace rutherfordium

https://gerrit.wikimedia.org/r/475123

Change 475123 merged by Dzahn:
[operations/dns@master] add people1001.eqiad.wmnet to replace rutherfordium

https://gerrit.wikimedia.org/r/475123

[ganeti1003:~] $ sudo makevm
This is an interactive script to make it easier to
create a Ganeti VM.
Please see https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM for more details.

Are you going to need a public IP? (y/n)
n

Please enter the correct row. (A, B or C - gnt-group list to show)
A

How many vCPUs do you need?
1

How much RAM do you need? (Gigabytes)
2

What disk size do you need? (Gigabytes)
80

How do you want to call the instance? (FQDN)
people1001.eqiad.wmnet

Based on your answers this is the full command to create the VM:

sudo gnt-instance add -t drbd -I hail --net 0:link=private --hypervisor-parameters=kvm:boot_order=network -o debootstrap+default --no-install -g row_A -B vcpus=1,memory=2g --disk 0:size=80g people1001.eqiad.wmnet

Do you want to run it now? (y/n) y
Ok, running.

Wed Nov 21 18:36:09 2018  - INFO: No-installation mode selected, disabling startup
Wed Nov 21 18:36:21 2018  - INFO: Selected nodes for instance people1001.eqiad.wmnet via iallocator hail: ganeti1005.eqiad.wmnet, ganeti1006.eqiad.wmnet
Wed Nov 21 18:36:22 2018 * creating instance disks...
Wed Nov 21 18:36:25 2018 adding instance people1001.eqiad.wmnet to cluster config

Change 475154 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: add people1001 to DHCP/partman

https://gerrit.wikimedia.org/r/475154

Change 475033 merged by Dzahn:
[operations/puppet@production] peopleweb: add stretch/PHP7 support

https://gerrit.wikimedia.org/r/475033

Change 475154 merged by Dzahn:
[operations/puppet@production] install_server: add people1001 to DHCP/partman

https://gerrit.wikimedia.org/r/475154

Change 475228 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: add role to people1001

https://gerrit.wikimedia.org/r/475228

Change 475228 merged by Dzahn:
[operations/puppet@production] peopleweb: add role to people1001

https://gerrit.wikimedia.org/r/475228

Change 475232 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: set httpd MPM to prefork explicitly

https://gerrit.wikimedia.org/r/475232

Change 475232 merged by Dzahn:
[operations/puppet@production] peopleweb: set httpd MPM to prefork explicitly

https://gerrit.wikimedia.org/r/475232

Change 475235 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove rutherfordium.eqiad.wmnet

https://gerrit.wikimedia.org/r/475235

Change 475236 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] cache/trafficserver: replace rutherfordium with people1001, backend and director

https://gerrit.wikimedia.org/r/475236

Change 475237 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] remove rutherfordium from site, netboot, DHCP

https://gerrit.wikimedia.org/r/475237

Change 475238 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: allow rsync of /home from rutherfordium to people1001

https://gerrit.wikimedia.org/r/475238

Change 475238 merged by Dzahn:
[operations/puppet@production] peopleweb: allow rsync of /home from rutherfordium to people1001

https://gerrit.wikimedia.org/r/475238

Change 475242 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: add mapped IPv6 to people1001

https://gerrit.wikimedia.org/r/475242

Change 475242 merged by Dzahn:
[operations/puppet@production] peopleweb: add mapped IPv6 to people1001

https://gerrit.wikimedia.org/r/475242

Change 475245 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for people1001.eqiad.wmnet.

https://gerrit.wikimedia.org/r/475245

Change 475245 merged by Dzahn:
[operations/dns@master] add IPv6 records for people1001.eqiad.wmnet.

https://gerrit.wikimedia.org/r/475245

Mentioned in SAL (#wikimedia-operations) [2018-11-21T23:34:23Z] <mutante> rsyncing /home from rutherfordium.eqiad to people1001.eqiad (people.wikimedia.org) T210036

Change 476335 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] lower TTL of people.wikimedia.org to 5M

https://gerrit.wikimedia.org/r/476335

Change 476335 merged by Dzahn:
[operations/dns@master] lower TTL of people.wikimedia.org to 5M

https://gerrit.wikimedia.org/r/476335

Change 476411 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] peopleweb: allow deployment server to connect to port 80

https://gerrit.wikimedia.org/r/476411

Change 476411 merged by Dzahn:
[operations/puppet@production] peopleweb: allow deployment server to connect to port 80

https://gerrit.wikimedia.org/r/476411

Change 475236 merged by Dzahn:
[operations/puppet@production] cache/trafficserver: replace rutherfordium with people1001, backend and director

https://gerrit.wikimedia.org/r/475236

Mentioned in SAL (#wikimedia-operations) [2018-11-29T20:42:42Z] <mutante> people.wikimedia.org is switching backends from rutherfordium to people1001, please stand by during a short maintenance period.. data has been copied | https://wikitech.wikimedia.org/wiki/People.wikimedia.org#Backend_upgrade_November_2018 | T210036

Mentioned in SAL (#wikimedia-operations) [2018-11-29T20:50:02Z] <mutante> people - rsynced /home one last time, switched DNS people.eqiad CNAME over, varnish change merged (T210036)

Change 476618 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] remove peopleweb role from rutherfordium

https://gerrit.wikimedia.org/r/476618

Change 476618 merged by Dzahn:
[operations/puppet@production] remove peopleweb role from rutherfordium

https://gerrit.wikimedia.org/r/476618

Change 475237 merged by Dzahn:
[operations/puppet@production] remove rutherfordium from site, netboot, DHCP

https://gerrit.wikimedia.org/r/475237

Mentioned in SAL (#wikimedia-operations) [2018-11-29T23:26:46Z] <mutante> puppetmaster: sudo puppet cert revoke rutherfordium.eqiad.wmnet; sudo puppet node clean rutherfordium.eqiad.wmnet ; sudo puppet node deactivate rutherfordium.eqiad.wmnet ; run puppet on icinga1001.. removed host from monitoring (decom for ganeti VM) (T210036)

Dzahn updated the task description. (Show Details)
Dzahn updated the task description. (Show Details)

Change 475235 merged by Dzahn:
[operations/dns@master] remove rutherfordium.eqiad.wmnet

https://gerrit.wikimedia.org/r/475235

Dzahn updated the task description. (Show Details)

all done. strike one jessie host off the list

Dzahn renamed this task from upgrade people.wm.org (rutherfordium) to stretch to upgrade people.wm.org to stretch (replace rutherfordium with people1001).Nov 30 2018, 12:10 AM
Dzahn renamed this task from upgrade people.wm.org to stretch (replace rutherfordium with people1001) to upgrade people.wikimedia.org to stretch (replace rutherfordium with people1001).
Dzahn added a project: Technical-Debt.

Mentioned in SAL (#wikimedia-operations) [2018-11-30T08:55:20Z] <moritzm> removed rutherfordium from debmonitor DB (T210036)

JFTR, It's better to use the wmf-decommission-host script, it also removes the debmonitor host entry (I fixed that manually).