Page MenuHomePhabricator

eqiad: (1) Ganeti VM for testing Kerberos in Production
Closed, ResolvedPublic

Description

Hi!

I'd need a bare minimum spec Ganeti VM in eqiad to test Kerberos in the Hadoop Test cluster. The VM will probably be deleted once we will fully productionize the service, but it is essential as of now to iron out all the implementation details of the project.

The VM doesn't need to be in the Analytics VLAN, and I'd say that a 40/50 GB disk space is more than enough for this use case. The DNS hostname could be kerberos1001.eqiad.wmnet. It doesn't need any public IP.

Thanks!

Event Timeline

Change 491219 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Reserve IP for kerberos1001.eqiad.wment (Ganeti VM)

https://gerrit.wikimedia.org/r/491219

Change 491222 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] WIP: Introduce kerberos1001

https://gerrit.wikimedia.org/r/491222

Change 491222 abandoned by Elukey:
WIP: Introduce kerberos1001

Reason:
Will split this in 2 parts

https://gerrit.wikimedia.org/r/491222

So after reading https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM this is what I'd do:

  1. Review/Merge https://gerrit.wikimedia.org/r/491219 to add a Private ip allocation in row A (after checking gnt-group list on ganeti1003 it seems to me that either A or C are fine, but lemme know if there is more to verify)
  2. Use makevm on ganeti1003 to create the VM and annotate the MAC address
  3. Create a puppet code change to add the node to DHCP and partman configs. Run puppet on install[12]002 before proceeding.
  4. Run gnt-instance start kerberos1001.eqiad.wnet on ganeti1003, and then attach to the console via gnt-instance console kerberos1001.eqiad.wnet
  5. Wait for the OS install to finish, and then before the end execute gnt-instance modify --hypervisor-parameters=boot_order=disk kerberos1001.eqiad.wnet
  6. Add the VM to site.pp and make sure that puppet runs fine
fsero triaged this task as Medium priority.Feb 18 2019, 1:27 PM

Change 491219 merged by Elukey:
[operations/dns@master] Allocate IP for kerberos1001.eqiad.wment (Ganeti VM)

https://gerrit.wikimedia.org/r/491219

I had a chat with Moritz about the naming, since kerberos1001 seems a very generic and probably misleading name. This VM will only be used for testing and ironing out of all the Kerberos service related details, it will be nuked after that (we are planning to have two bare metal hosts eventually).

So after reading https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM this is what I'd do:

  1. Review/Merge https://gerrit.wikimedia.org/r/491219 to add a Private ip allocation in row A (after checking gnt-group list on ganeti1003 it seems to me that either A or C are fine, but lemme know if there is more to verify)
  2. Use makevm on ganeti1003 to create the VM and annotate the MAC address
  3. Create a puppet code change to add the node to DHCP and partman configs. Run puppet on install[12]002 before proceeding.
  4. Run gnt-instance start kerberos1001.eqiad.wnet on ganeti1003, and then attach to the console via gnt-instance console kerberos1001.eqiad.wnet
  5. Wait for the OS install to finish, and then before the end execute gnt-instance modify --hypervisor-parameters=boot_order=disk kerberos1001.eqiad.wnet

Don't wait for the OS install to finish (you will not be fast enough). Make sure that it's working though and somewhere midway along the install run the above command.

  1. Add the VM to site.pp and make sure that puppet runs fine

Started makevm on a tmux session on ganeti1003

Full log:

elukey@ganeti1003:~$ makevm
This is an interactive script to make it easier to
create a Ganeti VM.
Please see https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM for more details.

Are you going to need a public IP? (y/n)
n

Please enter the correct row. (A, B or C - gnt-group list to show)
A

How many vCPUs do you need?
2

How much RAM do you need? (Gigabytes)
8

What disk size do you need? (Gigabytes)
50

How do you want to call the instance? (FQDN)
kerberos1001.eqiad.wmnet

Based on your answers this is the full command to create the VM:

sudo gnt-instance add -t drbd -I hail --net 0:link=private --hypervisor-parameters=kvm:boot_order=network -o debootstrap+default --no-install -g row_A -B vcpus=2,memory=8g --disk 0:size=50g kerberos1001.eqiad.wmnet

Do you want to run it now? (y/n) y
Ok, running.


Tue Feb 19 10:33:35 2019  - INFO: No-installation mode selected, disabling startup
Tue Feb 19 10:33:47 2019  - INFO: Selected nodes for instance kerberos1001.eqiad.wmnet via iallocator hail: ganeti1008.eqiad.wmnet, ganeti1005.eqiad.wmnet
Tue Feb 19 10:33:48 2019 * creating instance disks...
Tue Feb 19 10:33:51 2019 adding instance kerberos1001.eqiad.wmnet to cluster config
Tue Feb 19 10:33:51 2019 adding disks to cluster config
Tue Feb 19 10:33:51 2019  - INFO: Waiting for instance kerberos1001.eqiad.wmnet to sync disks
Tue Feb 19 10:33:51 2019  - INFO: - device disk/0:  0.10% done, 3h 18m 34s remaining (estimated)
Tue Feb 19 10:34:52 2019  - INFO: - device disk/0:  4.40% done, 22m 12s remaining (estimated)
Tue Feb 19 10:35:52 2019  - INFO: - device disk/0:  8.70% done, 20m 31s remaining (estimated)
Tue Feb 19 10:36:52 2019  - INFO: - device disk/0: 13.00% done, 19m 45s remaining (estimated)
Tue Feb 19 10:37:52 2019  - INFO: - device disk/0: 17.30% done, 18m 58s remaining (estimated)
Tue Feb 19 10:38:52 2019  - INFO: - device disk/0: 21.60% done, 17m 22s remaining (estimated)
Tue Feb 19 10:39:53 2019  - INFO: - device disk/0: 25.90% done, 16m 40s remaining (estimated)
Tue Feb 19 10:40:53 2019  - INFO: - device disk/0: 30.20% done, 15m 59s remaining (estimated)
Tue Feb 19 10:41:53 2019  - INFO: - device disk/0: 34.50% done, 15m 10s remaining (estimated)
Tue Feb 19 10:42:53 2019  - INFO: - device disk/0: 38.80% done, 13m 44s remaining (estimated)
Tue Feb 19 10:43:53 2019  - INFO: - device disk/0: 43.10% done, 12m 51s remaining (estimated)
Tue Feb 19 10:44:54 2019  - INFO: - device disk/0: 47.40% done, 12m 10s remaining (estimated)
Tue Feb 19 10:45:54 2019  - INFO: - device disk/0: 51.70% done, 10m 42s remaining (estimated)
Tue Feb 19 10:46:54 2019  - INFO: - device disk/0: 56.00% done, 9m 59s remaining (estimated)
Tue Feb 19 10:47:54 2019  - INFO: - device disk/0: 60.40% done, 9m 3s remaining (estimated)
Tue Feb 19 10:48:54 2019  - INFO: - device disk/0: 64.70% done, 8m 12s remaining (estimated)
Tue Feb 19 10:49:55 2019  - INFO: - device disk/0: 69.00% done, 6m 59s remaining (estimated)
Tue Feb 19 10:50:55 2019  - INFO: - device disk/0: 73.30% done, 6m 5s remaining (estimated)
Tue Feb 19 10:51:55 2019  - INFO: - device disk/0: 77.60% done, 5m 11s remaining (estimated)
Tue Feb 19 10:52:55 2019  - INFO: - device disk/0: 81.90% done, 4m 0s remaining (estimated)
Tue Feb 19 10:53:55 2019  - INFO: - device disk/0: 86.20% done, 3m 7s remaining (estimated)
Tue Feb 19 10:54:56 2019  - INFO: - device disk/0: 90.50% done, 2m 10s remaining (estimated)
Tue Feb 19 10:55:56 2019  - INFO: - device disk/0: 94.80% done, 1m 13s remaining (estimated)
Tue Feb 19 10:56:56 2019  - INFO: - device disk/0: 99.10% done, 12s remaining (estimated)
Tue Feb 19 10:57:08 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:08 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:08 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:09 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:09 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:09 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:09 2019  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Tue Feb 19 10:57:09 2019  - INFO: Instance kerberos1001.eqiad.wmnet's disks are in sync
Tue Feb 19 10:57:09 2019  - INFO: Waiting for instance kerberos1001.eqiad.wmnet to sync disks
Tue Feb 19 10:57:09 2019  - INFO: Instance kerberos1001.eqiad.wmnet's disks are in sync

Time to add the new instance to DHCP.
Here's the MAC address:

NicMAC/0
aa:00:00:c1:72:d9

Change 491442 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add kerberos1001 to DHCP and partman configs

https://gerrit.wikimedia.org/r/491442

Change 491442 merged by Elukey:
[operations/puppet@production] Add kerberos1001 to DHCP and partman configs

https://gerrit.wikimedia.org/r/491442

Change 491445 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add role::spare::system to kerberos1001

https://gerrit.wikimedia.org/r/491445

Change 491445 merged by Elukey:
[operations/puppet@production] Add role::spare::system to kerberos1001

https://gerrit.wikimedia.org/r/491445

elukey closed this task as Resolved.Feb 19 2019, 11:45 AM
elukey claimed this task.
Linux kerberos1001 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64
elukey@kerberos1001:~$

All good!