Page MenuHomePhabricator

codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw2001-dev.codfw.wmnet
Closed, ResolvedPublic

Description

We used the labtestvirt2003 spare server to develop/test/PoC the cloudgw architecture (see T261724). https://netbox.wikimedia.org/dcim/devices/1774/

We are planning on keep using it beyond the PoC. I think it makes sense to use proper naming, proper puppet role, proper hiera, etc.

More details to come.

Event Timeline

aborrero triaged this task as Medium priority.Jan 8 2021, 10:21 AM
aborrero created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.
aborrero renamed this task from codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X.codfw.wmnet to codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X-dev.codfw.wmnet.Jan 13 2021, 12:25 PM

hey @RobH or @Papaul I wanted to sync with you on this before moving forward.

I would like to officially rename this server to better reflect its purpose. I can do the reimaging/renaming myself, expect updating physical labels in the DC.

This server is not to be confused with the one being procured on T271590: (Need By: TBD) rack/setup/install cloudgw2002-dev.codfw.wmnet

@aborrero you can rename this one to cloudgw2001 and the new will be cloudgw2002

@aborrero you can rename this one to cloudgw2001 and the new will be cloudgw2002

ok! thanks

aborrero renamed this task from codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw200X-dev.codfw.wmnet to codfw1dev: repurpose/rename labtestvirt2003.codfw.wmnet as cloudgw2001-dev.codfw.wmnet.Jan 13 2021, 3:33 PM

Change 656121 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] labtestvirt2003: drop host from production

https://gerrit.wikimedia.org/r/656121

Change 656121 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] labtestvirt2003: drop host from production

https://gerrit.wikimedia.org/r/656121

cookbooks.sre.hosts.decommission executed by aborrero@cumin2001 for hosts: labtestvirt2003.codfw.wmnet

  • labtestvirt2003.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Found physical host
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Host steps raised exception: The request failed with code 405 Method Not Allowed: {'detail': 'Method "PATCH" not allowed.'}

ERROR: some step on some host failed, check the bolded items above

homer diff when removing labtestvirt2003 for posterity:

Configuration diff for asw-b-codfw.mgmt.codfw.wmnet:

[edit interfaces interface-range disabled]
     member ge-1/0/9 { ... }
+    member ge-1/0/11;
+    member ge-1/0/12;
     member ge-1/0/14 { ... }
[edit interfaces interface-range vlan-cloud-hosts1-b-codfw]
-    member ge-1/0/11;
[edit interfaces]
-   ge-1/0/11 {
-       description "labtestvirt2003:eno1 {#}";
-   }
-   ge-1/0/12 {
-       description "labtestvirt2003:eno2 {#}";
-       unit 0 {
-           family ethernet-switching {
-               interface-mode trunk;
-               vlan {
-                   members [ cloud-gw-transport-codfw cloud-instance-transport1-b-codfw ];
-               }
-           }
-       }
-   }

Change 656148 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw2001-dev: introduce server

https://gerrit.wikimedia.org/r/656148

Change 656148 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw2001-dev: introduce server

https://gerrit.wikimedia.org/r/656148

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101141403_aborrero_7880_cloudgw2001-dev_codfw_wmnet.log.

Mentioned in SAL (#wikimedia-operations) [2021-01-14T14:24:19Z] <arturo> running homer in asw-b-codfw* (T271519)

Mentioned in SAL (#wikimedia-operations) [2021-01-14T14:28:47Z] <arturo> running homer in asw-b-codfw* (T271519)

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101141532_aborrero_28477_cloudgw2001-dev_codfw_wmnet.log.

@aborrero physical label update. Do you still need for this server the other 2 network connections ?

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']

@aborrero physical label update. Do you still need for this server the other 2 network connections ?

Thanks!

Yes, the network connections should stay as they are.
I will finish the reimage tomorrow, the installer complained about the disk and then puppet as wel..

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101151014_aborrero_556_cloudgw2001-dev_codfw_wmnet.log.

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2001-dev.codfw.wmnet']