Page MenuHomePhabricator

Service implementation for wdqs20[09,10,11,12]
Closed, ResolvedPublic3 Estimated Story Points

Description

Procurement by dc-ops done in T291982; racking by dc-ops done in T294297

AC

Note

Event Timeline

Change 862369 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: Bring wdqs20[09,10,11,12] online

https://gerrit.wikimedia.org/r/862369

Change 862369 merged by Bking:

[operations/puppet@production] wdqs: Bring wdqs20[09,10,11,12] online

https://gerrit.wikimedia.org/r/862369

Mentioned in SAL (#wikimedia-operations) [2022-12-07T22:41:40Z] <ryankemper> T301167 Downtimed wdqs20[09-12] for 7 days

Waiting for data reload to complete on wdqs2009 (or another host) before we finish data transfers and pool these hosts

Mentioned in SAL (#wikimedia-operations) [2022-12-14T23:28:29Z] <ryankemper> T301167 wdqs2011/2012 were not visible in pybal (oversight from when I added the other hosts with conftool last week). Fixed that, so now all of the new hosts are showing up properly.

One tip to avoid having people on call (like me) worrying about pending implementation services is to add the hiera key profile::monitoring::notifications_enabled: false. This is not promoted much because most people handle stateless services that are easy and fast to setup. If you, like me, handle stateful ones that can take a lot to load its data and be fully setup, this flag is useful until they are fully productionized. Hope this tip is found useful. Thank you and sorry for the previous ping!

Change 881000 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: disable notifs on not-yet-in-service hosts

https://gerrit.wikimedia.org/r/881000

One tip to avoid having people on call (like me) worrying about pending implementation services is to add the hiera key profile::monitoring::notifications_enabled: false. This is not promoted much because most people handle stateless services that are easy and fast to setup. If you, like me, handle stateful ones that can take a lot to load its data and be fully setup, this flag is useful until they are fully productionized. Hope this tip is found useful. Thank you and sorry for the previous ping!

@jcrespo Thanks for the suggestion, I wasn't aware of that flag! We'll definitely make use of that (https://gerrit.wikimedia.org/r/c/operations/puppet/+/881000)

Change 881000 merged by Ryan Kemper:

[operations/puppet@production] wdqs: disable notifs on not-yet-in-service hosts

https://gerrit.wikimedia.org/r/881000

Change 888707 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] wdqs data reload: clear out unused option

https://gerrit.wikimedia.org/r/888707

Change 888707 merged by Bking:

[operations/cookbooks@master] wdqs data reload: clear out unused option

https://gerrit.wikimedia.org/r/888707

Change 891626 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] [WIP] Revert "wdqs: disable notifs on not-yet-in-service hosts"

https://gerrit.wikimedia.org/r/891626

Arbitrarily chosen wdqs2012 is happy as seen by our query test suite:

image.png (191×1 px, 53 KB)

All that should be left is decoming wdqs200[1-3]

Arbitrarily chosen wdqs2012 is happy as seen by our query test suite:

image.png (191×1 px, 53 KB)

All that should be left is decoming wdqs200[1-3]

Actually, this only tested the wikidata journal. We still need to transfer the categories journal over before these hosts are fully happy.

Change 891626 merged by Ryan Kemper:

[operations/puppet@production] Revert "wdqs: disable notifs on not-yet-in-service hosts"

https://gerrit.wikimedia.org/r/891626

Mentioned in SAL (#wikimedia-operations) [2023-02-28T19:21:16Z] <ryankemper> [WDQS] (The following was ~20 hours ago, forgot to press enter) T301167 Transferred /srv/wdqs/categories.jnl from wdqs2001 (in-service host) to wdqs20[09-12] (new hosts being brought into service)

Mentioned in SAL (#wikimedia-operations) [2023-02-28T19:21:51Z] <ryankemper> [WDQS] (Current time) T301167 Re-enabled icinga notifications for wdqs20[09-12]

Change 893825 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: decom wdqs200[1-3]

https://gerrit.wikimedia.org/r/893825

Change 893825 merged by Ryan Kemper:

[operations/puppet@production] wdqs: decom wdqs200[1-3]

https://gerrit.wikimedia.org/r/893825

Decom ticket for dc-ops: T331074

cookbooks.sre.hosts.decommission executed by ryankemper@cumin2002 for hosts: wdqs[2001-2003].codfw.wmnet

  • wdqs2001.codfw.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • wdqs2002.codfw.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • wdqs2003.codfw.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB