Page MenuHomePhabricator

(Need by: TBD) rack/setup/install 86 new codfw mw systems
Closed, ResolvedPublic

Description

Please note these systems were ordered on T231255, and DC-Ops needs feedback from serviceops on the split between mw, kubenetes, thumbor, & wtp systems ordered on that task.

serviceops: Please update this task to denote how many of the 106 total systems are to be split between mw, kubenetes, thumbor, & wtp. Once we have that info, 3 additional tasks can be filed for kubernetes, thumbor, and wtp.

Racking Proposal:

RowABCD
mw servers19253111

Hostnames: mw2[291-2377].codfw.wmnet
This checklist should be duplicated for EVERY SINGLE HOST:

  • - receive in system on procurement task T231255
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned):
  • - bios/drac/serial setup/testing :
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/puppetproduction+5 -1
operations/dnsmaster+31 -4
operations/puppetproduction+30 -0
labs/privatemaster+0 -0
labs/privatemaster+0 -0
operations/puppetproduction+101 -0
operations/dnsmaster+96 -1
labs/privatemaster+0 -0
operations/puppetproduction+11 -1
operations/puppetproduction+10 -0
operations/puppetproduction+5 -0
operations/puppetproduction+66 -0
operations/dnsmaster+66 -0
operations/puppetproduction+13 -5
operations/puppetproduction+7 -0
operations/puppetproduction+7 -2
labs/privatemaster+0 -0
operations/puppetproduction+10 -0
operations/puppetproduction+114 -0
operations/dnsmaster+114 -1
operations/puppetproduction+25 -0
operations/puppetproduction+125 -0
operations/dnsmaster+150 -6
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 577005 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp

https://gerrit.wikimedia.org/r/577005

Change 577005 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp

https://gerrit.wikimedia.org/r/577005

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2350.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050155_pt1979_25417_mw2350_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2351.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050205_pt1979_26769_mw2351_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2350.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2352.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050219_pt1979_31241_mw2352_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2351.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2353.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050228_pt1979_947_mw2353_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2352.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2354.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050242_pt1979_4738_mw2354_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2353.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2355.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050253_pt1979_7508_mw2355_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2354.codfw.wmnet']

and were ALL successful.

Servers Rack C6Ready for service
mw2350Yes
mw2351Yes
mw2352Yes
mw2353Yes
mw2354Yes
mw2355Yes
mw2356Yes
mw2357Yes
mw2358Yes
mw2359Yes
mw2360Yes
mw2361Yes
mw2362Yes
mw2363Yes
mw2364Yes
mw2365Yes

Completed auto-reimage of hosts:

['mw2355.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2356.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051309_pt1979_19863_mw2356_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2357.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051309_pt1979_19973_mw2357_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2356.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2358.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051334_pt1979_25893_mw2358_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2358.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2357.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2359.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051654_pt1979_30504_mw2359_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2360.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051659_pt1979_31165_mw2360_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2359.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2360.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2361.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051723_pt1979_4306_mw2361_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2362.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051726_pt1979_4692_mw2362_codfw_wmnet.log.

Change 577308 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Completed auto-reimage of hosts:

['mw2361.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2363.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051747_pt1979_8865_mw2363_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2362.codfw.wmnet']

and were ALL successful.

Change 577308 merged by Dzahn:
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2364.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051754_pt1979_9989_mw2364_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2363.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2365.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051811_pt1979_13802_mw2365_codfw_wmnet.log.

Change 577314 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] fix fake certificate names for new codfw appservers

https://gerrit.wikimedia.org/r/577314

Completed auto-reimage of hosts:

['mw2364.codfw.wmnet']

and were ALL successful.

Change 577314 merged by Dzahn:
[labs/private@master] fix fake certificate names for new codfw appservers

https://gerrit.wikimedia.org/r/577314

Completed auto-reimage of hosts:

['mw2365.codfw.wmnet']

and were ALL successful.

For a Total of 86 mw servers , 71 are done and 15 left. The 15 left are waiting for space in row C in rack C3

Papaul renamed this task from (Need by: TBD) rack/setup/install new codfw mw systems to (Need by: TBD) rack/setup/install 86 new codfw mw systems.Mar 5 2020, 7:10 PM
Dzahn changed the task status from Open to Stalled.EditedMar 5 2020, 7:23 PM

thanks Papaul for all the new servers. We'll continue with racking right after eqiad is done. We can do that on separate ticket ( T247021) as well to keep service implementation out of it.

I made T247018 which now blocks this ticket.

mw2291 through mw2324 are now pooled and status active in netbox (34 servers) https://gerrit.wikimedia.org/r/q/topic:%22appservers-codfw%22+(status:open%20OR%20status:merged)

mw2325 through mw2334 are not pooled but in site.pp and status staged in netbox (10 servers) https://gerrit.wikimedia.org/r/c/operations/puppet/+/577408

mw2335 through mw2349 are not pooled, not in site.pp and status planned in netbox (15 servers) (blocked by T247018)

mw2350 through mw2376 are not pooled, in site.pp and and status staged in netbox ((27 servers) (https://gerrit.wikimedia.org/r/c/operations/puppet/+/577409)

total: 86 servers

We have 5 mw servers left to be racked in row c rack c3 since we used 10 servers in T252185

Dzahn changed the task status from Stalled to Open.May 22 2020, 2:34 PM

Hi @Papaul

23 servers from rack C3 have been decom'ed. mw2150 through mw2172. (lower part of the rack)

You can:

  • remove these physically from the rack
  • use the space for your planned test of new cabling schema
  • rack 5 new servers in their place

In an order of your choice.

Next week I am planning to remove even more so we might be able to empty out almost the entire C3.

also: T247018#6158017.

Change 599749 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add new appservers mw2336 through mw2339

https://gerrit.wikimedia.org/r/599749

Change 604339 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add management and production IPs for mw2335-mw2339

https://gerrit.wikimedia.org/r/604339

@Papaul I uploaded a new change to add mgmt and production IPs for mw2335-mw2339 (C3). Does it look good to you?

Change 604393 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for mw233[5-9]

https://gerrit.wikimedia.org/r/604393

Change 604393 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address for mw233[5-9]

https://gerrit.wikimedia.org/r/604393

Change 604339 merged by Dzahn:
[operations/dns@master] add management and production IPs for mw2335-mw2339

https://gerrit.wikimedia.org/r/604339

[edit interfaces interface-range vlan-private1-c-codfw]
     member xe-7/0/3 { ... }
+    member ge-3/0/3;
+    member ge-3/0/4;
+    member ge-3/0/5;
+    member ge-3/0/6;
+    member ge-3/0/7;
[edit interfaces interface-range disabled]
-    member ge-3/0/3;
-    member ge-3/0/4;
-    member ge-3/0/5;
-    member ge-3/0/6;
-    member ge-3/0/7;

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2335.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101504_pt1979_1393_mw2335_codfw_wmnet.log.

Change 599749 merged by Dzahn:
[operations/puppet@production] site: add new appservers mw2335 through mw2339

https://gerrit.wikimedia.org/r/599749

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2336.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101519_pt1979_4336_mw2336_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2335.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2337.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101529_pt1979_6453_mw2337_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2336.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2338.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101545_pt1979_10271_mw2338_codfw_wmnet.log.

Change 604429 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: add mw2335-mw2339 as appservers

https://gerrit.wikimedia.org/r/604429

Completed auto-reimage of hosts:

['mw2337.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2339.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101556_pt1979_12212_mw2339_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2338.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2339.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)

@Dzahn the 5 servers in C3 are ready for services

Change 604429 merged by Dzahn:
[operations/puppet@production] site: add mw2335-mw2339 as appservers

https://gerrit.wikimedia.org/r/604429

@Dzahn the 5 servers in C3 are ready for services

They are now in production. (details in T247021)

@Papaul This ticket talks about mw2377 but it seems to me we never had a host mw2377 (only up to mw2376). Can you confirm that?

76 servers are pooled as appservers. 10 have been used for kubernetes. Adds up to 86.