Page MenuHomePhabricator

rack/setup/ codfw: ganeti2009 - ganeti201[0-8]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of the new ganeti hosts received in T216187

Rack proposal:

ganeti2009 row C rack C1
ganeti2010 row C rack C1
ganeti2011 row C rack C5
ganeti2012 row C rack C5
ganeti2013 row C rack C6
ganeti2014 row C rack C6
ganeti2015 row D rack D1
ganeti2016 row D rack D3
ganeti2017 row D rack D5
ganeti2018 row D rack D8

ganeti2009

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2010

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2011

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2012

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2013

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2014

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2015

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2016

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2017

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

ganeti2018

  • - receive in system on procurement task T216187
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing/
  • - RAID : RAID 5 /partman:ganeti.cfg
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup
    • end on-site specific steps
  • - production dns entries
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation (stretch) - update netbox status to staged & IMMEDIATELY RUN/SIGN PUPPET
  • - puppet accept/initial run
  • - handoff for service implementation - service implementation team must change status from staged to active

Event Timeline

Papaul created this task.May 29 2019, 3:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2019, 3:57 PM
Papaul triaged this task as Normal priority.May 29 2019, 4:13 PM
Papaul updated the task description. (Show Details)May 30 2019, 5:28 PM
Papaul updated the task description. (Show Details)Jun 4 2019, 12:40 PM
Papaul updated the task description. (Show Details)Jun 4 2019, 8:27 PM
Papaul updated the task description. (Show Details)Jun 4 2019, 9:58 PM
Papaul added a comment.Jun 7 2019, 3:20 PM
This comment was removed by Papaul.
Papaul added a subscriber: ayounsi.Jun 7 2019, 3:36 PM

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Change 515111 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8]

https://gerrit.wikimedia.org/r/515111

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Yes, fine by me.

RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.Jun 12 2019, 1:11 PM

@ayounsi I am planning on installing those new servers in row c and row D and I don't have the "interface-range ganeti" in both of those rows Is it okay for me to go ahead and create "interface-range ganeti" on asw-c-codfw and asw-d-codfw?

Sure, don't hesitate to ask if you need a review before commit.

Papaul updated the task description. (Show Details)Jun 20 2019, 8:12 PM

add interface range ganeti in both row C and D

row C

interface-range ganeti {            
        member ge-1/0/19;               
        native-vlan-id 2019;            
        unit 0 {                        
            family ethernet-switching { 
                interface-mode trunk;   
                vlan {                  
                    members [ private1-c-codfw public1-c-codfw ];
                }                       
            }                           
        }                               
    }

Row D

interface-range ganeti {            
        member ge-1/0/2;                
        native-vlan-id 2020;            
        unit 0 {                        
            family ethernet-switching { 
                interface-mode trunk;   
                vlan {                  
                    members [ private1-d-codfw public1-d-codfw ];
                }                       
            }                           
        }                               
    }

Change 515111 merged by Alexandros Kosiaris:
[operations/dns@master] DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8]

https://gerrit.wikimedia.org/r/515111

Change 518303 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address entries for ganeti2009 and ganeti201[0-8]

https://gerrit.wikimedia.org/r/518303

Change 518305 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Partman: Add ganeti201[0-8]

https://gerrit.wikimedia.org/r/518305

Papaul updated the task description. (Show Details)Jun 24 2019, 12:33 PM

Change 518303 merged by Alexandros Kosiaris:
[operations/puppet@production] DHCP: Add MAC address entries for ganeti2009 and ganeti201[0-8]

https://gerrit.wikimedia.org/r/518303

Change 518305 merged by Alexandros Kosiaris:
[operations/puppet@production] Partman: Add ganeti201[0-8]

https://gerrit.wikimedia.org/r/518305

@akosiaris this is where it stops

This is an overview of your currently configured partitions and mount │

│ points. Select a partition to modify its settings (file system, mount   │   
│ point, etc.), a free space to create partitions, or a device to         │   
│ initialize its partition table.                                         │   
│                                                                         │   
│           Configure iSCSI volumes                           -           │   
│                                                             ▒           │   
│           SCSI3 (0,0,0) (sda) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI4 (0,0,0) (sdb) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI5 (0,0,0) (sdc) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│           SCSI6 (0,0,0) (sdd) - 800.2 GB ATA THNSF8800CCSE  ▒           │   
│                                                             ▒           │   
│           Undo changes to partitions                        0           │   
│           Finish partitioning and write changes to disk     .           │   
│                                                                         │   
│     <Go Back>

Change 519075 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ganeti: Setup buster and a software RAID5 recipe

https://gerrit.wikimedia.org/r/519075

So, the controllers on those boxes can't do hardware RAID and hence the drivers sees them as AHCI. That's fine, we already have multiple boxes with software RAID and can continue doing so. I 've uploaded the partman recipe above that I am currently testing (it already worked past the problematic stage pointed out above) that should resolve this and proceed normally.

@akosiaris are we going with Buster see @MoritzMuehlenhoff comments

Change 519075 merged by Alexandros Kosiaris:
[operations/puppet@production] ganeti: Setup buster and a software RAID5 recipe

https://gerrit.wikimedia.org/r/519075

Change 519441 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ganeti partman: Switch to 100% of lvm guided size

https://gerrit.wikimedia.org/r/519441

Change 519441 merged by Alexandros Kosiaris:
[operations/puppet@production] ganeti partman: Switch to 100% of lvm guided size

https://gerrit.wikimedia.org/r/519441

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906271636_akosiaris_35762.log.

Completed auto-reimage of hosts:

['ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

Of which those FAILED:

['ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2017.codfw.wmnet', 'ganeti2018.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906280716_akosiaris_206735.log.

Completed auto-reimage of hosts:

['ganeti2014.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2018.codfw.wmnet', 'ganeti2017.codfw.wmnet']

Of which those FAILED:

['ganeti2014.codfw.wmnet', 'ganeti2009.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2016.codfw.wmnet', 'ganeti2015.codfw.wmnet', 'ganeti2018.codfw.wmnet', 'ganeti2017.codfw.wmnet']

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['ganeti2009.codfw.wmnet', 'ganeti2010.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2012.codfw.wmnet', 'ganeti2013.codfw.wmnet', 'ganeti2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906280956_akosiaris_239649.log.

Completed auto-reimage of hosts:

['ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2009.codfw.wmnet']

Of which those FAILED:

['ganeti2012.codfw.wmnet', 'ganeti2011.codfw.wmnet', 'ganeti2009.codfw.wmnet']
Papaul updated the task description. (Show Details)Jun 28 2019, 2:18 PM
akosiaris updated the task description. (Show Details)Jul 1 2019, 10:12 AM
akosiaris updated the task description. (Show Details)Jul 1 2019, 3:48 PM
ayounsi removed a subscriber: ayounsi.Jul 1 2019, 3:49 PM
akosiaris closed this task as Resolved.Jul 1 2019, 3:50 PM
akosiaris updated the task description. (Show Details)

All hosts are installed. They will be added to the clusters in a different task. @Papaul, thanks!