Page MenuHomePhabricator

Decomission an-airflow1003 (legacy platform_eng instance)
Closed, ResolvedPublic

Description

From T312858#8159458

Event Timeline

Announce decommisioning on analytics-announce list?

In this case probably not necessary. You just need to coordinate with the potential users of platform eng airflow, so everybody in analytics-platform-eng-admins group.

xcollazo changed the task status from Open to In Progress.Aug 22 2022, 4:19 PM

We are moving forward to decommissioning this an-airflow1003 legacy instance. The only production job that ran in the legacy instance, image-suggestions, has been migrated to a new instance (an-airflow1004) and has been running smoothly for a couple weeks now.

CCing all direct members of group analytics-platform-eng-admins for awareness: @lbowmaker, @Cparle, @mfossati, @fkaelin .

xcollazo changed the task status from In Progress to Open.Oct 11 2022, 4:23 PM

@xcollazo - How's this task looking now? Have all of the required DAGs been moved to the new instance? It would be nice to clean up the stray configs for this legacy instance, whenever it's convenient.

@xcollazo - How's this task looking now? Have all of the required DAGs been moved to the new instance? It would be nice to clean up the stray configs for this legacy instance, whenever it's convenient.

Yes, no one is using this anymore, and all relevant code is migrated. We can safely nuke this!

I'm going to start work on this task to decom an-airflow1003. The main reason for wanting to get it done is that it's now an outlier using the older version and there are some CVEs that apply to it from T336244.

Change 943611 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove an-airflow1003 and its role from puppet

https://gerrit.wikimedia.org/r/943611

BTullis triaged this task as Medium priority.Jul 31 2023, 4:17 PM

Change 943611 merged by Btullis:

[operations/puppet@production] Remove an-airflow1003 and its role from puppet

https://gerrit.wikimedia.org/r/943611

cookbooks.sre.hosts.decommission executed by btullis@cumin1001 for hosts: an-airflow1003.eqiad.wmnet

  • an-airflow1003.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
BTullis updated the task description. (Show Details)
BTullis moved this task from In Progress to Done on the Data-Platform-SRE board.