Page MenuHomePhabricator

Improve how we run WMCS cookbooks
Closed, ResolvedPublic

Description

WMCS cookbooks are currently run by the WMCS team from their own laptops. We would like to run them from a Cumin server in a similar way to production cookbooks.

This is a tracking task of the agreed work between the Cloud-Services team and the Infrastructure-Foundations one.

I've grouped the work based on affected areas. We can open subtasks as needed.

✅ Improve the laptop local environment

T319426 Add ssh socks5 proxy support (blocks removing duplication) (1 day)

✅ New WMCS cookbooks repository

T319436 Import into the new repo the code, splitting libs from cookbooks (½ day)

✅ Setup the production infrastructure

T323516, T323518 Create cloud cumin ganeti hosts, 2 VMs, one per DC (½ day)

  • Setup dedicated SSH config for the double jump in cloud (½ day) (patch)
✅ Spicerack improvements
  • Add module injection support (2 days) (patch)
  • Add register of accessors support (2 days) (patch)
  • Add sudo everywhere (1w, might be less) [not needed for the current setup]

T325168 Load cookbooks from multiple directories (2 days)

✅ Setup the Cloud infrastructure

T323483 Define which SSH key to use to SSH from the new cloud-cumin to hosts
T323484 Fine tune the SSHd config of the restricted bastion for better performances (½ day)

✅ Misc

T325756 Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL (1~2w)
T325754 Update Spicerack documentation

Postponed to later

We could not complete the following tasks as part of this epic, and they can be postponed as they are not a blocker for the main goal of running WMCS cookbooks from the new cloudcumin hosts.

T319438 Remove code duplication (alertmanager) (1w)
T319450 Move the libs to spicerack modules (2w this might be really easy though, depending on the module solution)
T325067 Decide sudoers rules for users without global root
T325758 Spicerack: Add CI step to test with wmcs cookbooks (1w)
T322511 [spicerack][alertmanager] support silencing alerts without instance label

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/puppetproduction+0 -0
operations/puppetproduction+1 -1
operations/puppetproduction+4 -0
operations/puppetproduction+7 -1
operations/puppetproduction+2 -0
operations/puppetproduction+35 -4
operations/puppetproduction+5 -0
operations/puppetproduction+4 -1
operations/puppetproduction+1 -5
operations/puppetproduction+30 -4
operations/homer/publicmaster+1 -1
operations/puppetproduction+6 -0
operations/puppetproduction+5 -1
operations/puppetproduction+82 -15
operations/puppetproduction+125 -18
operations/puppetproduction+24 -0
operations/puppetproduction+1 -0
operations/puppetproduction+5 -0
operations/puppetproduction+2 -0
operations/software/spicerackmaster+204 -12
operations/puppetproduction+20 -0
operations/puppetproduction+48 -3
operations/cookbooksmaster+1 -1
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fnegri changed the task status from Open to In Progress.Dec 14 2022, 3:04 PM
fnegri updated the task description. (Show Details)

Change 867551 merged by Volans:

[operations/puppet@production] cumin::cloud_master: introduce new profile

https://gerrit.wikimedia.org/r/867551

Change 868372 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:installserver::proxy: add ability to proxy ssh ports

https://gerrit.wikimedia.org/r/868372

Change 867169 merged by Volans:

[operations/puppet@production] base::cloud_production: introduce new profile

https://gerrit.wikimedia.org/r/867169

Change 868632 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: use the puppetdb microservice

https://gerrit.wikimedia.org/r/868632

Change 868636 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: actually allow ssh from the masters

https://gerrit.wikimedia.org/r/868636

Change 868632 merged by Volans:

[operations/puppet@production] cloudcumin: use the puppetdb microservice

https://gerrit.wikimedia.org/r/868632

Change 868636 merged by Volans:

[operations/puppet@production] cloudcumin: actually allow ssh from the masters

https://gerrit.wikimedia.org/r/868636

Change 868646 had a related patch set uploaded (by Volans; author: Volans):

[operations/homer/public@master] cr-labs: allow SSH from the cloudcumin_group

https://gerrit.wikimedia.org/r/868646

Change 868646 merged by jenkins-bot:

[operations/homer/public@master] cr-labs: allow SSH from the cloudcumin_group

https://gerrit.wikimedia.org/r/868646

Change 868372 merged by Jbond:

[operations/puppet@production] P:installserver::proxy: add ability to proxy ssh ports

https://gerrit.wikimedia.org/r/868372

Change 868673 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: use the webproxy to connect to Cloud

https://gerrit.wikimedia.org/r/868673

Change 868673 merged by Volans:

[operations/puppet@production] cloudcumin: use the webproxy to connect to Cloud

https://gerrit.wikimedia.org/r/868673

Change 868720 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: improve ssh config

https://gerrit.wikimedia.org/r/868720

Change 868720 merged by Volans:

[operations/puppet@production] cloudcumin: improve ssh config

https://gerrit.wikimedia.org/r/868720

Mentioned in SAL (#wikimedia-cloud) [2022-12-16T19:36:40Z] <volans> restarted sshd twice on bastion-restricted-eqiad1-02 to debug SSH connections for T319401

Change 869173 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cumin::cloud_master: add openstack dependencies

https://gerrit.wikimedia.org/r/869173

Change 869173 merged by Volans:

[operations/puppet@production] cumin::cloud_master: add openstack dependencies

https://gerrit.wikimedia.org/r/869173

Change 869212 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cumin::cloud_master: configure openstack backend

https://gerrit.wikimedia.org/r/869212

Change 869212 merged by Volans:

[operations/puppet@production] cumin::cloud_master: configure openstack backend

https://gerrit.wikimedia.org/r/869212

Change 869245 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cumin:cloud_master: fix ssh_config for bastions

https://gerrit.wikimedia.org/r/869245

Change 869245 merged by Volans:

[operations/puppet@production] cumin:cloud_master: fix ssh_config for bastions

https://gerrit.wikimedia.org/r/869245

Change 869268 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloud cumin: fix authorized keys for cumin

https://gerrit.wikimedia.org/r/869268

Change 869268 merged by Volans:

[operations/puppet@production] cloud cumin: fix authorized keys for cumin

https://gerrit.wikimedia.org/r/869268

Change 869278 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloud: authorize cumin from the bastion

https://gerrit.wikimedia.org/r/869278

Change 869278 merged by Volans:

[operations/puppet@production] cloud: authorize cumin from the bastion

https://gerrit.wikimedia.org/r/869278

Change 869779 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: add FQDN of the eqiad1 bastion

https://gerrit.wikimedia.org/r/869779

Change 869779 merged by Volans:

[operations/puppet@production] cloudcumin: add FQDN of the eqiad1 bastion

https://gerrit.wikimedia.org/r/869779

Change 869782 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloudcumin: fix hieradata for codfw1dev bastion

https://gerrit.wikimedia.org/r/869782

Change 869782 merged by Volans:

[operations/puppet@production] cloudcumin: fix hieradata for codfw1dev bastion

https://gerrit.wikimedia.org/r/869782

Change 869816 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] cloud cumin: fix ssh config for codf1dev bastion

https://gerrit.wikimedia.org/r/869816

Change 869167 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@wmcs] Add moved to wmcs-cookbooks message.

https://gerrit.wikimedia.org/r/869167

Change 869816 merged by Volans:

[operations/puppet@production] cloud cumin: fix ssh config for codf1dev bastion

https://gerrit.wikimedia.org/r/869816

fnegri removed a subtask: Restricted Task.Jan 13 2023, 6:39 PM

As Q2 is now over, I suggest that we consider this first iteration complete as soon as we can successfully run WMCS cookbooks from the new cloudcumin hosts. Everything else can be tracked in separate tasks/epics.

I went for a WP:BOLD approach and modified the task description, moving all subtasks that are not strictly blockers to a new section "Postponed to later". Let me know if you disagree and feel free to suggest alternative plans. :)

Aklapper renamed this task from WMCS Cookbook Automation Q2 tracking task to WMCS Cookbook Automation FY2022-23 Q2 tracking task.Jan 13 2023, 8:17 PM
fnegri raised the priority of this task from Medium to High.Apr 12 2023, 2:50 PM
fnegri moved this task from FY2022/2023-Q3 to FY2022/2023-Q4 on the cloud-services-team board.
fnegri renamed this task from WMCS Cookbook Automation FY2022-23 Q2 tracking task to Improve how we run WMCS cookbooks.Jul 10 2023, 5:31 PM
fnegri updated the task description. (Show Details)
fnegri updated the task description. (Show Details)
fnegri added a project: Epic.
fnegri updated the task description. (Show Details)