Page MenuHomePhabricator

Request sudo access for Jclark-ctr
Closed, ResolvedPublic

Description

Opening this task to address access requests so @Jclark-ctr can execute all the necessary cookbooks to perform his day-to-day tasks.

context:
IF historically has been working on reducing the number of people with global root-level access see: [T244840] and [T289779]. Additional considerations for added security controls for SRE edge cases exist [T299989].

However, the need for John to be able to perform necessary actions in his day to day duties does not change, and there are a couple of options:

  1. grant global root access
  2. propose a list of sudo commands to grant access on the cumin hosts while the longer-term solution of a Kerberos non-root cumin configuration is generally available and ready.

Discussing with the team, it feels like the latter option is the most sensible approach. Following is a placeholder (for now) for the list of cookbooks and additional commands needed to access via sudo

Commands required to run with escalated privileges sudo:

Cumin hosts:

  • cookbook sre.hosts.provision
  • cookbook sre.hosts.reimage
  • cookbook sre.dns.netbox
  • homer

Puppetmasters

  • puppet-merge

apt hosts

  • run-puppet-agent

Once this list is complete, we can proceed with granting the necessary access.

Event Timeline

@Volans, please correct or amend this task if I missed anything.

@wiki_willy we will need your approval as @Jclark-ctr 's manager

@MoritzMuehlenhoff: adding you for awareness and feedback.

@Papaul and @Jclark-ctr when possible please provide a list of the cookbooks and anything else we need to consider for this request.

Thanks all!

@MoritzMuehlenhoff: adding you for awareness and feedback.

Yes, it sounds good to me, I already spoke about it with Riccardo earlier the day. Note that we already have an existing group "datacenter-ops" so if we bind the new permissions to that group the other group member (Willy) would also get those permissions (which seems fine to me).

@lmata please see below for list requested

on cumin:

  • sudo cookbook sre.hosts.provision
  • sudo cookbook sre.hosts.reimage
  • sudo cookbook sre.dns.netbox
  • homer

on puppetmaster1001

  • sudo puppet-merge

on apt.wikimedia.org

  • sudo puppet agent

Thanks @Papaul. Access for John Clark to run these commands is all approved on my end as well. Thanks, Willy

on apt.wikimedia.org

  • sudo puppet agent

This should be replaced by run-puppet-agent instead, the puppet agent should never be run directly.

I've updated the task description according to T306654#7873125.
As for the puppet-merge on the puppetmasters, does the datacenter-ops have +2 on the operations/puppet repository on Gerrit?

Volans renamed this task from WIP: request sudo access for Jclark-ctr to Request sudo access for Jclark-ctr.Apr 22 2022, 9:18 AM
Volans triaged this task as Medium priority.

thank you all for your feedback and input!

As for the puppet-merge on the puppetmasters, does the datacenter-ops have +2 on the operations/puppet repository on Gerrit?

To be explicit +2 on gerrit and sudo puppet-merge allows one to promote them self to global root, which seems undesirable. what exactly is puppet-merge access required for. perhaps we can work on migrating this functionality elsewhere?

Note that it's possible to do damages with Homer and Netbox write access, so it needs to be treated carefully.

That said I'm fine with John having homer access.

On the other hand, if we can wait, want to be on the safe side and don't need the full range of homer features, I'm working on adding network devices support to cookbooks, this should be done by end of quarter.

As for the puppet-merge on the puppetmasters, does the datacenter-ops have +2 on the operations/puppet repository on Gerrit?

Not directly because of the datacenter-ops group but you get it from the LDAP ops group and John is in that group. So that should work.

https://gerrit.wikimedia.org/r/admin/repos/operations/puppet,access

we already have an existing group "datacenter-ops" ..

Yes, please use that group. Back when we created that the idea had already been to, step by step, add the sudo privs needed for datacenter-ops and eventually move all users into it. So this would as planned.

Not directly because of the datacenter-ops group but you get it from the LDAP ops group and John is in that group. So that should work

For me the access to puppet-merge and +2 on the puppet repo needs more information as to what tasks need to be preformed. As mentioned the combination of theses two privileges allows one to the same amount of control as global root.

In relation to access membership in the ops group , what is this required for as it should be fairly simple to add datacenter-ops as an allowed group to all the SSO services, similar to what we have done for sre-admins.

@jbond has good points here, I think. Could be clarified what membership in "ops" is for. My first guesses of the most important parts would be access to Icinga web UI ('wmf' would do that too though) and the +2 on Gerrit (which was irrelevant without shell access). If making a new LDAP group is fairly simple that seems like a good idea, actually.

There is already an group named sre-admins (used for SRE's without root), that gives the same SSO access to web service ops the ops group, but doesn't have +2 on the operations repos. We could either use this group or copy it. To avoid confusion i would vote to create a new group but use sre-admins as an reference when assigning permissions in the puppet repo.

however we also need to understand the work flow that requires puppet-merge and ensure that something is in place to enable this.

the work flow that requires puppet-merge

@Jclark-ctr Correct me if I'm wrong but this is mostly about install_server / DHCP changes, adding MAC addresses of new hardware, setting the OS to install, and then to install the OS with a cookbook. Is that about right?

Yea looks right i just need it for setting up servers

We discussed this in yesterday's SRE IF meeting: Let's start by adding sudo permissions for the three cookbooks listed, homer be implicitly started by these cookbooks. +2 on Puppet is root-equivalent and there should be very few cases left where it's needed for the server racking workflow (e.g. for extending the partman globbing if there's a new server naming scheme). If those remaining cases are identified, then this can also trickle into future automation work (e.g. the partman config could become a drop-down menu in Netbox at some point).

Change 809338 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] admin: allow sudo for jclark-ctr for cookbooks

https://gerrit.wikimedia.org/r/809338

Change 809338 merged by Ssingh:

[operations/puppet@production] admin: allow sudo for jclark-ctr for cookbooks

https://gerrit.wikimedia.org/r/809338

@Jclark-ctr have you had a chance to test your newly granted sudo permissions?

Dzahn changed the task status from Open to In Progress.Jul 29 2022, 5:37 PM
Dzahn reassigned this task from Volans to Jclark-ctr.
Dzahn added a subscriber: Volans.
BCornwall subscribed.

@Jclark-ctr Since there's been no activity on this ticket for some time I'm going to go ahead and close it. Please feel free to re-open if this issue has not been resolved.

sudo cookbook -d sre.dns.netbox
This command is requiring me to enter password and not working

@Jclark-ctr you need to use the secure-cookbook binary instead of the cookbook one. See also the related patch above for how thats configured: https://gerrit.wikimedia.org/r/c/operations/puppet/+/809338/2/modules/admin/data/data.yaml

@Volans still seems to have a issue

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/wmflib/config.py", line 33, in load_yaml_config
    with open(config_file, 'r', encoding='utf-8') as fh:
PermissionError: [Errno 13] Permission denied: '/etc/spicerack/config.yaml'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/cookbook", line 33, in <module>
    sys.exit(load_entry_point('wikimedia-spicerack==3.2.1', 'console_scripts', 'cookbook')())
  File "/usr/lib/python3/dist-packages/spicerack/_cookbook.py", line 399, in main
    config = load_yaml_config(args.config_file)
  File "/usr/lib/python3/dist-packages/wmflib/config.py", line 39, in load_yaml_config
    raise WmflibError(repr(e)) from e
wmflib.exceptions.WmflibError: PermissionError(13, 'Permission denied')

@Jclark-ctr just to avoid misunderstanding, did you run it with sudo?

sudo secure-cookbook -d sre.dns.netbox "noop"

That was without sudo.
With Sudo still ask for password

Ahhh I think I know what happened here, it's the dry-run option. Try to run it for real:

sudo secure-cookbook sre.dns.netbox "noop"

@jbond although the secure-cookbook is designed to allow to pass the global parameters that are not --config, the sudo rules are not AFAICT. What do you suggest we do here?

jclark@cumin1001:~$ sudo secure-cookbook sre.dns.netbox "noop"

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for jclark:

Change 831987 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] admin: fix sudo permission for datacenter-ops

https://gerrit.wikimedia.org/r/831987

@Jclark-ctr whoops, that's not wha't supposed to happen. On second review I think that the original patch has an error, I've just sent a new patch to fix it, but I'd like @jbond to have a look at it tomorrow. Sorry for the trouble.

Change 831987 merged by Volans:

[operations/puppet@production] admin: fix sudo permission for datacenter-ops

https://gerrit.wikimedia.org/r/831987

@Jclark-ctr patch merged, could you please retry the sudo secure-cookbook sre.dns.netbox "noop" one?

Thanks @Volans

jclark@cumin1001:~$ sudo secure-cookbook sre.dns.netbox "noop"
START - Cookbook sre.dns.netbox
Generating the DNS records from Netbox data. It will take a couple of minutes.
----- OUTPUT of 'cd /tmp && runus...cumin1001: noop"' -----
2022-09-14 16:17:27,493 [INFO] Gathering devices, interfaces, addresses and prefixes from Netbox
PASS |                                          |   0% (0/1) [00:00<?, ?hosts/s^[[B^[[A^[[A                                     |   0% (0/1) [00:00<?, ?hosts/s]

Thanks for verifying! It looks like this ticket can be closed.