Page MenuHomePhabricator

rack/setup/ codfw: ganeti2009 - ganeti201[0-8]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of the new ganeti hosts received in T216187

Rack proposal:

ganeti2009 row C rack C1
ganeti2010 row C rack C1
ganeti2011 row C rack C5
ganeti2012 row C rack C5
ganeti2013 row C rack C6
ganeti2014 row C rack C6
ganeti2015 row D rack D1
ganeti2016 row D rack D3
ganeti2017 row D rack D5
ganeti2018 row D rack D8

ganeti2009

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2010

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2011

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2012

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2013

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2014

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2015

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2016

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2017

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2018

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

Event Timeline

Papaul triaged this task as Medium priority.May 29 2019, 4:13 PM
This comment was removed by Papaul.

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Change 515111 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8]

https://gerrit.wikimedia.org/r/515111

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Yes, fine by me.

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Sure, don't hesitate to ask if you need a review before commit.

add interface range ganeti in both row C and D

row C

interface-range ganeti {            
        member ge-1/0/19;               
        native-vlan-id 2019;            
        unit 0 {                        
            family ethernet-switching { 
                interface-mode trunk;   
                vlan {                  
                    members [ private1-c-codfw public1-c-codfw ];
                }                       
            }                           
        }                               
    }

Row D

interface-range ganeti {            
        member ge-1/0/2;                
        native-vlan-id 2020;            
        unit 0 {                        
            family ethernet-switching { 
                interface-mode trunk;   
                vlan {                  
                    members [ private1-d-codfw public1-d-codfw ];
                }                       
            }                           
        }                               
    }

Change 515111 merged by Alexandros Kosiaris:
[operations/dns@master] DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8]

https://gerrit.wikimedia.org/r/515111

Change 518303 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address entries for ganeti2009 and ganeti201[0-8]

https://gerrit.wikimedia.org/r/518303

Change 518305 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Partman: Add ganeti201[0-8]

https://gerrit.wikimedia.org/r/518305

Change 518303 merged by Alexandros Kosiaris:
[operations/puppet@production] DHCP: Add MAC address entries for ganeti2009 and ganeti201[0-8]

https://gerrit.wikimedia.org/r/518303

Change 518305 merged by Alexandros Kosiaris:
[operations/puppet@production] Partman: Add ganeti201[0-8]

https://gerrit.wikimedia.org/r/518305

@akosiaris this is where it stops

This is an overview of your currently configured partitions and mount │

│ points. Select a partition to modify its settings (file system, mount   │   
│ point, etc.), a free space to create partitions, or a device to         │   
│ initialize its partition table.                                         │   
│                                                                         │   
│           Configure iSCSI volumes                           -           │   
│                                                             ▒           │   
│           SCSI3 (0,0,0) (sda) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI4 (0,0,0) (sdb) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI5 (0,0,0) (sdc) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI6 (0,0,0) (sdd) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│                                                             ▒           │   
│           Undo changes to partitions                        0           │   
│           Finish partitioning and write changes to disk     .           │   
│                                                                         │   
│     <Go Back>

Change 519075 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ganeti: Setup buster and a software RAID5 recipe

https://gerrit.wikimedia.org/r/519075

So, the controllers on those boxes can't do hardware RAID and hence the drivers sees them as AHCI. That's fine, we already have multiple boxes with software RAID and can continue doing so. I 've uploaded the partman recipe above that I am currently testing (it already worked past the problematic stage pointed out above) that should resolve this and proceed normally.

Change 519075 merged by Alexandros Kosiaris:
[operations/puppet@production] ganeti: Setup buster and a software RAID5 recipe

https://gerrit.wikimedia.org/r/519075

Change 519441 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ganeti partman: Switch to 100% of lvm guided size

https://gerrit.wikimedia.org/r/519441

Change 519441 merged by Alexandros Kosiaris:
[operations/puppet@production] ganeti partman: Switch to 100% of lvm guided size

https://gerrit.wikimedia.org/r/519441

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906271636_akosiaris_35762.log.

Completed auto-reimage of hosts:

['ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

Of which those FAILED:

['ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906280716_akosiaris_206735.log.

Completed auto-reimage of hosts:

['ganeti2014.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2018.codfw.wmnet', 'ganeti2017.codfw.wmnet']

Of which those FAILED:

['ganeti2014.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2018.codfw.wmnet', 'ganeti2017.codfw.wmnet']

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906280956_akosiaris_239649.log.

Completed auto-reimage of hosts:

['ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2009.codfw.wmnet']

Of which those FAILED:

['ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2009.codfw.wmnet']
akosiaris updated the task description. (Show Details)

All hosts are installed. They will be added to the clusters in a different task. @Papaul, thanks!

Change 570601 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] site.pp: Add new ganeti codfw hosts as role::spare

https://gerrit.wikimedia.org/r/570601

Change 570601 merged by Alexandros Kosiaris:
[operations/puppet@production] site.pp: Add new ganeti codfw hosts as role::spare

https://gerrit.wikimedia.org/r/570601

Icinga downtime for 1 day, 0:00:00 set by akosiaris@cumin1001 on 10 host(s) and their services with reason: enable VT

ganeti[2009-2018].codfw.wmnet

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006040933_akosiaris_6293.log.

Completed auto-reimage of hosts:

['ganeti2016.codfw.wmnet', 'ganeti2010.codfw.wmnet']

Of which those FAILED:

['ganeti2016.codfw.wmnet', 'ganeti2010.codfw.wmnet']

Change 602364 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Assign role::ganeti to new ganeti expansion hosts

https://gerrit.wikimedia.org/r/602364

Change 602364 merged by Alexandros Kosiaris:
[operations/puppet@production] Assign role::ganeti to new ganeti expansion hosts

https://gerrit.wikimedia.org/r/602364

Change 602379 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ganeti: ganeti[12]0{09..24}.eqiad|codfw.wmnet to hieradata

https://gerrit.wikimedia.org/r/602379

Change 602379 merged by Alexandros Kosiaris:
[operations/puppet@production] ganeti: ganeti[12]0{09..24}.eqiad|codfw.wmnet to hieradata

https://gerrit.wikimedia.org/r/602379

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2016.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006050844_akosiaris_205225.log.

Completed auto-reimage of hosts:

['ganeti2016.codfw.wmnet']

and were ALL successful.