Page MenuHomePhabricator

Implement depool (source only) and keep-downtime options on data-transfer cookbook
Closed, ResolvedPublic3 Estimated Story Points

Description

As we bring more hosts online in T332314 , we're inadvertently triggering a lot of alerts. We're also getting some unnecessary cookbook failures.

Creating this ticket to:

  • Add a "keep downtime" option to prevent alerts from triggering on hosts that aren't yet ready for production. Don't remove downtimes, just rely on the duration.
  • Add a "depool source only" option to allow repooling of the source host only. The current options force us to either not depool (so we have to remember to do that manually), or depool and get a failure because the "pool" option is not yet available on the new hosts.

Event Timeline

bking triaged this task as High priority.

Change 934595 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] wdqs.data-transfer: reformat using black

https://gerrit.wikimedia.org/r/934595

Change 934602 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] wdqs.data-transfer: Add more pool options

https://gerrit.wikimedia.org/r/934602

Change 934595 abandoned by Bking:

[operations/cookbooks@master] wdqs.data-transfer: reformat using black

Reason:

We still need to do this, but will hold off until we finish making functional changes described in T340793

https://gerrit.wikimedia.org/r/934595

bking set the point value for this task to 3.Jul 10 2023, 3:32 PM

Change 934602 merged by jenkins-bot:

[operations/cookbooks@master] wdqs.data-transfer: Add more pool options

https://gerrit.wikimedia.org/r/934602

Based on recent cookbook runs, it appears that the "lvs_strategy=both" option is not working. Leaving this as a note to myself to look at it Monday.

Change 937535 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] wdqs.data-transfer: Keep downtime

https://gerrit.wikimedia.org/r/937535

RKemper moved this task from In Progress to Done on the Data-Platform-SRE board.

Change 937535 merged by jenkins-bot:

[operations/cookbooks@master] wdqs.data-transfer: Keep downtime

https://gerrit.wikimedia.org/r/937535

RKemper subscribed.

Removed subtask because I think the scap ticket is not directly related to this one.