Page MenuHomePhabricator

codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw2001-dev.codfw.wmnet
Closed, ResolvedPublic

Description

We used the labtestvirt2003 spare server to develop/test/PoC the cloudgw architecture (see T261724). https://netbox.wikimedia.org/dcim/devices/1774/

We are planning on keep using it beyond the PoC. I think it makes sense to use proper naming, proper puppet role, proper hiera, etc.

More details to come.

Event Timeline

aborrero triaged this task as Medium priority.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.
aborrero renamed this task from codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X.codfw.wmnet to codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X-dev.codfw.wmnet.Jan 13 2021, 12:25 PM

hey @RobH or @Papaul I wanted to sync with you on this before moving forward.

I would like to officially rename this server to better reflect its purpose. I can do the reimaging/renaming myself, expect updating physical labels in the DC.

This server is not to be confused with the one being procured on T271590: (Need By: TBD) rack/setup/install cloudgw2002-dev.codfw.wmnet

@aborrero you can rename this one to cloudgw2001 and the new will be cloudgw2002

@aborrero you can rename this one to cloudgw2001 and the new will be cloudgw2002

ok! thanks

aborrero renamed this task from codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X-dev.codfw.wmnet to codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw2001-dev.codfw.wmnet.Jan 13 2021, 3:33 PM

Change 656121 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] labtestvirt2003: drop host from production

https://gerrit.wikimedia.org/r/656121

Change 656121 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] labtestvirt2003: drop host from production

https://gerrit.wikimedia.org/r/656121

cookbooks.sre.hosts.decommission executed by aborrero@cumin2001 for hosts: labtestvirt2003.codfw.wmnet

  • labtestvirt2003.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Found physical host
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Host steps raised exception: The request failed with code 405 Method Not Allowed: {'detail': 'Method "PATCH" not allowed.'}

ERROR: some step on some host failed, check the bolded items above

homer diff when removing labtestvirt2003 for posterity:

Configuration diff for asw-b-codfw.mgmt.codfw.wmnet:

[edit interfaces interface-range disabled]
     member ge-1/0/9 { ... }
+    member ge-1/0/11;
+    member ge-1/0/12;
     member ge-1/0/14 { ... }
[edit interfaces interface-range vlan-cloud-hosts1-b-codfw]
-    member ge-1/0/11;
[edit interfaces]
-   ge-1/0/11 {
-       description "labtestvirt2003:eno1 {#}";
-   }
-   ge-1/0/12 {
-       description "labtestvirt2003:eno2 {#}";
-       unit 0 {
-           family ethernet-switching {
-               interface-mode trunk;
-               vlan {
-                   members [ cloud-gw-transport-codfw cloud-instance-transport1-b-codfw ];
-               }
-           }
-       }
-   }

Change 656148 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw2001-dev: introduce server

https://gerrit.wikimedia.org/r/656148

Change 656148 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw2001-dev: introduce server

https://gerrit.wikimedia.org/r/656148

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101141403_aborrero_7880_cloudgw2001-dev_codfw_wmnet.log.

Mentioned in SAL (#wikimedia-operations) [2021-01-14T14:24:19Z] <arturo> running homer in asw-b-codfw* (T271519)

Mentioned in SAL (#wikimedia-operations) [2021-01-14T14:28:47Z] <arturo> running homer in asw-b-codfw* (T271519)

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101141532_aborrero_28477_cloudgw2001-dev_codfw_wmnet.log.

@aborrero physical label update. Do you still need for this server the other 2 network connections ?

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']

@aborrero physical label update. Do you still need for this server the other 2 network connections ?

Thanks!

Yes, the network connections should stay as they are.
I will finish the reimage tomorrow, the installer complained about the disk and then puppet as wel..

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101151014_aborrero_556_cloudgw2001-dev_codfw_wmnet.log.

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']