Page MenuHomePhabricator

decom phab2001 (service owner)
Closed, ResolvedPublic

Description

As discussed in today's meeting between serviceops-collab and releng, at this point we don't really see an advantage of having phab2001 around as a warm standby.

If we had to fail-over to codfw we would use phab2002 for that and the hardware phab2001 can start to be decom'ed.

So this is a decom subtask for that as we will need one anyways later once it goes to dcops.

Event Timeline

Dzahn changed the task status from Open to In Progress.Nov 2 2022, 6:22 PM
Dzahn created this task.
Dzahn moved this task from Incoming to Work in Progress on the collaboration-services board.

Change 852261 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: stop phab2001 from being an rsync client

https://gerrit.wikimedia.org/r/852261

Change 852264 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: remove phab2001 from the list of phab servers

https://gerrit.wikimedia.org/r/852264

Change 852266 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] phabricator: switch phab2001 to phab2002 in commented line

https://gerrit.wikimedia.org/r/852266

Change 852272 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] rename varnish service alias for phab2001-aphlict?

https://gerrit.wikimedia.org/r/852272

Change 852261 merged by Dzahn:

[operations/puppet@production] phabricator: stop phab2001 from being an rsync client

https://gerrit.wikimedia.org/r/852261

the git-ssh service has been removed from LVS on lvs servers and is removed from DNS, but there is still pybal data here, we should clean this up too:

https://config-master.wikimedia.org/pybal/codfw/git-ssh

(somehow)

Is phab2002.codfw.wmnet a warm standy as well? I think the original idea was to be able to switch Phabricator/CI/Gerrit when doing the data center switch overs (that is the overall task T156937). For Phabricator there are a few related tasks hinting at it:

Change 852266 merged by Dzahn:

[operations/dns@master] phabricator: switch phab2001 to phab2002 in commented line

https://gerrit.wikimedia.org/r/852266

Change 852264 merged by Dzahn:

[operations/puppet@production] phabricator: remove phab2001 from the list of phab servers

https://gerrit.wikimedia.org/r/852264

Change 852272 merged by Dzahn:

[operations/dns@master] delete varnish service alias for phab2001-aphlict

https://gerrit.wikimedia.org/r/852272

Is phab2002.codfw.wmnet a warm standy as well? I think the original idea was to be able to switch Phabricator/CI/Gerrit when doing the data center switch overs (that is the overall task T156937). For Phabricator there are a few related tasks hinting at it:

I think that question is definitely valid but orthogonal to this task. phab2001 is simply going away due to age of hardware and phab2002 is there to replace it.

It had no relation to changing the number of phab servers in the same data center.

Change 853051 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/phabricator: move phab2001 from prod to insetup role

https://gerrit.wikimedia.org/r/853051

LSobanski triaged this task as Medium priority.Nov 4 2022, 3:36 PM

Change 853051 merged by Dzahn:

[operations/puppet@production] site/phabricator: move phab2001 from prod to insetup role

https://gerrit.wikimedia.org/r/853051

Mentioned in SAL (#wikimedia-operations) [2022-11-07T22:51:13Z] <mutante> phab2001 - removing from production puppet role - removes ssh access, ferm rules, exim config and more T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-07T22:53:35Z] <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on phab2001.codfw.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-07T22:53:50Z] <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on phab2001.codfw.wmnet with reason: T322250

Change 855688 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: rm hierdata/hosts/phab2001.yaml

https://gerrit.wikimedia.org/r/855688

Mentioned in SAL (#wikimedia-operations) [2022-11-10T18:58:22Z] <mutante> phabricator - running decom cookbook on phab2001 - T322250

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: phab2001.codfw.wmnet

  • phab2001.codfw.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 855688 merged by Dzahn:

[operations/puppet@production] phabricator: rm hierdata/hosts/phab2001.yaml

https://gerrit.wikimedia.org/r/855688

Change 855697 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: remove phab2001

https://gerrit.wikimedia.org/r/855697

Change 855697 merged by Dzahn:

[operations/puppet@production] site: remove phab2001

https://gerrit.wikimedia.org/r/855697

our part is done. host is gone from prod and repos.

this will continue on T322880 for dcops (server lifecycle demands a ticket from template)

Mentioned in SAL (#wikimedia-operations) [2022-11-10T19:54:44Z] <mutante> netbox - deleting special case phab2001-vcs.codfw.wmnet IPv4 (10.192.32.149) and IPv6 (2620:0:860:103:10:192:32:149) - T296022 - T322250

Dzahn renamed this task from decom phab2001 to decom phab2001 (service owner).Nov 17 2022, 8:19 PM

Change 858412 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] update SPF record for phabricator.wikimedia.org, phab2001->phab2002

https://gerrit.wikimedia.org/r/858412

Change 858412 merged by Dzahn:

[operations/dns@master] update SPF record for phabricator.wikimedia.org, phab2001->phab2002

https://gerrit.wikimedia.org/r/858412

Mentioned in SAL (#wikimedia-operations) [2022-11-22T00:31:05Z] <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-22T00:31:21Z] <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-22T01:13:36Z] <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on phab1004.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-22T01:13:51Z] <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on phab1004.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-22T01:56:16Z] <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on phab1004.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-22T01:56:21Z] <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on phab1004.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-28T22:00:40Z] <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250

Mentioned in SAL (#wikimedia-operations) [2022-11-28T22:00:56Z] <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250