Page MenuHomePhabricator

(Need by: TBD) rack/setup/install 86 new codfw mw systems
Closed, ResolvedPublic

Description

Please note these systems were ordered on T231255, and DC-Ops needs feedback from serviceops on the split between mw, kubenetes, thumbor, & wtp systems ordered on that task.

serviceops: Please update this task to denote how many of the 106 total systems are to be split between mw, kubenetes, thumbor, & wtp. Once we have that info, 3 additional tasks can be filed for kubernetes, thumbor, and wtp.

Racking Proposal:

RowABCD
mw servers19253111

Hostnames: mw2[291-2377].codfw.wmnet
This checklist should be duplicated for EVERY SINGLE HOST:

  • - receive in system on procurement task T231255
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned):
  • - bios/drac/serial setup/testing :
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -1
operations/puppetproduction+5 -1
operations/dnsmaster+31 -4
operations/puppetproduction+30 -0
labs/privatemaster+0 -0
labs/privatemaster+0 -0
operations/puppetproduction+101 -0
operations/dnsmaster+96 -1
labs/privatemaster+0 -0
operations/puppetproduction+11 -1
operations/puppetproduction+10 -0
operations/puppetproduction+5 -0
operations/puppetproduction+66 -0
operations/dnsmaster+66 -0
operations/puppetproduction+13 -5
operations/puppetproduction+7 -0
operations/puppetproduction+7 -2
labs/privatemaster+0 -0
operations/puppetproduction+10 -0
operations/puppetproduction+114 -0
operations/dnsmaster+114 -1
operations/puppetproduction+25 -0
operations/puppetproduction+125 -0
operations/dnsmaster+150 -6
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Papaul updated the task description. (Show Details)Mar 4 2020, 11:49 PM

Change 577005 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp

https://gerrit.wikimedia.org/r/577005

Change 577005 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp

https://gerrit.wikimedia.org/r/577005

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2350.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050155_pt1979_25417_mw2350_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2351.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050205_pt1979_26769_mw2351_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2350.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2352.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050219_pt1979_31241_mw2352_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2351.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2353.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050228_pt1979_947_mw2353_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2352.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2354.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050242_pt1979_4738_mw2354_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2353.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2355.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003050253_pt1979_7508_mw2355_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2354.codfw.wmnet']

and were ALL successful.

Papaul added a comment.EditedMar 5 2020, 3:14 AM
Servers Rack C6Ready for service
mw2350Yes
mw2351Yes
mw2352Yes
mw2353Yes
mw2354Yes
mw2355Yes
mw2356Yes
mw2357Yes
mw2358Yes
mw2359Yes
mw2360Yes
mw2361Yes
mw2362Yes
mw2363Yes
mw2364Yes
mw2365Yes

Completed auto-reimage of hosts:

['mw2355.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2356.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051309_pt1979_19863_mw2356_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2357.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051309_pt1979_19973_mw2357_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2356.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2358.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051334_pt1979_25893_mw2358_codfw_wmnet.log.

Papaul claimed this task.Mar 5 2020, 1:55 PM

Completed auto-reimage of hosts:

['mw2358.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2357.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2359.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051654_pt1979_30504_mw2359_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2360.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051659_pt1979_31165_mw2360_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2359.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2360.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2361.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051723_pt1979_4306_mw2361_codfw_wmnet.log.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2362.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051726_pt1979_4692_mw2362_codfw_wmnet.log.

Change 577308 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Completed auto-reimage of hosts:

['mw2361.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2363.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051747_pt1979_8865_mw2363_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2362.codfw.wmnet']

and were ALL successful.

Change 577308 merged by Dzahn:
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2364.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051754_pt1979_9989_mw2364_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2363.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2365.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003051811_pt1979_13802_mw2365_codfw_wmnet.log.

Change 577314 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] fix fake certificate names for new codfw appservers

https://gerrit.wikimedia.org/r/577314

Completed auto-reimage of hosts:

['mw2364.codfw.wmnet']

and were ALL successful.

Change 577314 merged by Dzahn:
[labs/private@master] fix fake certificate names for new codfw appservers

https://gerrit.wikimedia.org/r/577314

Completed auto-reimage of hosts:

['mw2365.codfw.wmnet']

and were ALL successful.

Papaul added a comment.Mar 5 2020, 6:50 PM

For a Total of 86 mw servers , 71 are done and 15 left. The 15 left are waiting for space in row C in rack C3

Papaul renamed this task from (Need by: TBD) rack/setup/install new codfw mw systems to (Need by: TBD) rack/setup/install 86 new codfw mw systems.Mar 5 2020, 7:10 PM
Dzahn changed the task status from Open to Stalled.EditedMar 5 2020, 7:23 PM

thanks Papaul for all the new servers. We'll continue with racking right after eqiad is done. We can do that on separate ticket ( T247021) as well to keep service implementation out of it.

I made T247018 which now blocks this ticket.

Papaul moved this task from Racking Tasks to Blocked on the ops-codfw board.Mar 6 2020, 1:00 AM
Dzahn added a comment.EditedMar 6 2020, 1:51 AM

mw2291 through mw2324 are now pooled and status active in netbox (34 servers) https://gerrit.wikimedia.org/r/q/topic:%22appservers-codfw%22+(status:open%20OR%20status:merged)

mw2325 through mw2334 are not pooled but in site.pp and status staged in netbox (10 servers) https://gerrit.wikimedia.org/r/c/operations/puppet/+/577408

mw2335 through mw2349 are not pooled, not in site.pp and status planned in netbox (15 servers) (blocked by T247018)

mw2350 through mw2376 are not pooled, in site.pp and and status staged in netbox ((27 servers) (https://gerrit.wikimedia.org/r/c/operations/puppet/+/577409)

total: 86 servers

We have 5 mw servers left to be racked in row c rack c3 since we used 10 servers in T252185

Dzahn changed the task status from Stalled to Open.May 22 2020, 2:34 PM

Hi @Papaul

23 servers from rack C3 have been decom'ed. mw2150 through mw2172. (lower part of the rack)

You can:

  • remove these physically from the rack
  • use the space for your planned test of new cabling schema
  • rack 5 new servers in their place

In an order of your choice.

Next week I am planning to remove even more so we might be able to empty out almost the entire C3.

also: T247018#6158017.

Papaul moved this task from Blocked to Racking Tasks on the ops-codfw board.May 24 2020, 4:14 PM

Change 599749 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add new appservers mw2336 through mw2339

https://gerrit.wikimedia.org/r/599749

Papaul updated the task description. (Show Details)Jun 9 2020, 6:04 PM
Papaul updated the task description. (Show Details)Jun 10 2020, 12:15 AM

Change 604339 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add management and production IPs for mw2335-mw2339

https://gerrit.wikimedia.org/r/604339

@Papaul I uploaded a new change to add mgmt and production IPs for mw2335-mw2339 (C3). Does it look good to you?

Change 604393 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for mw233[5-9]

https://gerrit.wikimedia.org/r/604393

Change 604393 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address for mw233[5-9]

https://gerrit.wikimedia.org/r/604393

Change 604339 merged by Dzahn:
[operations/dns@master] add management and production IPs for mw2335-mw2339

https://gerrit.wikimedia.org/r/604339

[edit interfaces interface-range vlan-private1-c-codfw]
     member xe-7/0/3 { ... }
+    member ge-3/0/3;
+    member ge-3/0/4;
+    member ge-3/0/5;
+    member ge-3/0/6;
+    member ge-3/0/7;
[edit interfaces interface-range disabled]
-    member ge-3/0/3;
-    member ge-3/0/4;
-    member ge-3/0/5;
-    member ge-3/0/6;
-    member ge-3/0/7;
Papaul updated the task description. (Show Details)Jun 10 2020, 2:59 PM

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2335.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101504_pt1979_1393_mw2335_codfw_wmnet.log.

Change 599749 merged by Dzahn:
[operations/puppet@production] site: add new appservers mw2335 through mw2339

https://gerrit.wikimedia.org/r/599749

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2336.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101519_pt1979_4336_mw2336_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2335.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2337.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101529_pt1979_6453_mw2337_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2336.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2338.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101545_pt1979_10271_mw2338_codfw_wmnet.log.

Change 604429 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: add mw2335-mw2339 as appservers

https://gerrit.wikimedia.org/r/604429

Completed auto-reimage of hosts:

['mw2337.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

mw2339.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202006101556_pt1979_12212_mw2339_codfw_wmnet.log.

Completed auto-reimage of hosts:

['mw2338.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw2339.codfw.wmnet']

and were ALL successful.

Papaul closed this task as Resolved.Jun 10 2020, 4:28 PM
Papaul updated the task description. (Show Details)

@Dzahn the 5 servers in C3 are ready for services

Change 604429 merged by Dzahn:
[operations/puppet@production] site: add mw2335-mw2339 as appservers

https://gerrit.wikimedia.org/r/604429

Dzahn added a comment.Jun 17 2020, 3:24 PM

@Dzahn the 5 servers in C3 are ready for services

They are now in production. (details in T247021)

Dzahn added a comment.Jun 17 2020, 3:30 PM

@Papaul This ticket talks about mw2377 but it seems to me we never had a host mw2377 (only up to mw2376). Can you confirm that?

Dzahn added a comment.Jun 17 2020, 3:32 PM

76 servers are pooled as appservers. 10 have been used for kubernetes. Adds up to 86.

@Dzahn confirm we have up to mw2376

Dzahn awarded a token.Jun 17 2020, 5:45 PM