Page MenuHomePhabricator

allow mw-deployers to disable puppet on mwdebug hosts
Closed, ResolvedPublic

Description

It has come up repeatedly that certain users, deployers with shell access but without global root, are missing the ability to temporarily disable puppet for testing purposes.

They are limited by puppet reverting their local changes they try to make to test something.

For SRE it is considered ok to temp. disable puppet for such a purpose as long as:

  • it doesn't stay disabled for too long (we have Icinga alerting for this after a while, becomes a real issue after a week when it drops out of puppet DB, monitoring needs to be visible)
  • certain hosts are made for testing and others are not. it's generally expected to use a "debug", "canary" "dev" or "test" host instead of the real deal where it's possible

The purpose of this task is to discuss this a bit but also treat it like a regular access request ticket.

So I am suggesting we pick a host or small group of hosts like mwdebug* where we allow this (not globally and specifically not bastion*.bastions should not be used for this) for a certain group like deployers, or a new group of 'puppet-disablers' if that seems more reasoanable.

Then we'd add a line to their existing sudo privileges to let them run "disable-puppet/enable-puppet <reason>" either directly or with a wrapper around it. (the one used by cumin?).

Finally would be nice to then have some more visibility in monitoring for "puppet has been disabled for too long" for just these hosts.
And / or something in big red letters in the MOTD and on IRC so people don't leave it disabled for too long and are aware when others currently have it disabled.

Event Timeline

jcrespo added subscribers: joanna_borun, jcrespo.

This seems to me like a reasonable requests, although as you point out, the details of how to exactly implement it to make sure everyone uses it responsibly may require feedback from SRE. A bad implementation (where puppet is left disabled for an extended amount of time unatteded) could have security implications.

As per procedure, this should be discussed in the Infrastructure-Foundations next meeting (when ready to be discussed, of course) and assigned to @joanna_borun. (CCed only for now). Because the topic is related to application servers, people from service ops should be probably present to provide feedback during the specific discussion.

But I guess some days for pre-discussion on ticket first would be advised as a RFC (both from SREs and deployers).

let them run "puppet disable/enable" either directly or with a wrapper around it. (the one used by cumin?).

Nobody should run those commands! The related wrappers should always be used instead, because only these wrappers do the right thing in all situations:

  • disable-puppet: disable puppet forcing a message, appending the $USER to the message automatically and returns only if an already in-flight puppet run has completed. It does not override an existing message if Puppet is already disabled.
  • enable-puppet: enable Puppet only if it was disabled with a specific message, appending the $USER to the message automatically.
  • run-puppet-agent: execute a puppet agent run reliably, waiting for any preceding puppet runs to complete before starting. This should allow to ensure a coordinated puppet run after a change is effective everywhere.

If this request is granted the --force parameter shouldn't probably be allowed.

P.S. Adding @lmata in place of Joanna.

IO that seems like a reasonable request, but we should discuss this in the next IF SRE meeting

My only worry is that such temporary resets will be forgotten and then might have inconsistent state across mwdebug servers (if Puppet ran on some and not on others). I'd expect that all cases on mwdebug* servers are fairly time-limited, so we could maybe limit these to an hour (and then have a timer re-enable Puppet if not lifted by the user).

Volans triaged this task as Medium priority.Apr 21 2022, 9:26 AM

This was discussed in the Infrastructure Foundations team meeting and was found to be a okay (to grant the permissions to disable/enable Puppet and force a run via the wrappers). The only concern was that this might lead to situations where enabling Puppet gets forgotten, which could be mitigated by a systemd timer which e.g. re-enables Puppet if it was disabled for longer than 24 hours.

Since this task was rather about asking for input, I'm removing the SRE-Access-Requests tag for now. If any team/individual wants to request this change to be implemented, please re-add it along with the specific group to be enabled etc.

ok, thank you IF team! assigning back to me for the moment to follow-up. Yes, there was a specific person. I will readd this with a specific group after discussion.

Alright, finally getting back to this.

So the request is that the group "deployment", which is already on the canary_appserver role on mwdebug hosts, additionally gets a sudo privs line that let's them enable and disable puppet.

Though this means deployers could also disable puppet on all other hosts where deployers have access and it would not be strictly limited to mwdebug hosts.

So maybe it should be a new group but that copies the members from deployment.

Uploading a patch for that.

Change 879147 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] admin/canary_appserver: add group of users allowed to disable puppet

https://gerrit.wikimedia.org/r/879147

@daniel This was for you, remember that?

Dzahn renamed this task from allow certain users to disable puppet on mwdebug hosts to allow mw-deployers to disable puppet on mwdebug hosts.Wed, Jan 11, 10:26 PM
Dzahn added a subscriber: Muehlenhoff.

This was discussed in the Infrastructure Foundations team meeting and was found to be a okay ...
... If any team/individual wants to request this change to be implemented, please re-add it along with the specific group to be enabled etc.

Alright @Muehlenhoff It's finally back now with a specific suggestion.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/879147

Dzahn removed Dzahn as the assignee of this task.Wed, Jan 11, 10:27 PM
Dzahn added a project: SRE-Access-Requests.

Change 879147 merged by Dzahn:

[operations/puppet@production] admin/canary_appserver: add group of users allowed to disable puppet

https://gerrit.wikimedia.org/r/879147

@daniel Here you go, you (and other deployers) should now be able to disable (and enable) puppet on mediawiki canary appservers.

This means the mwdebug* servers and the few selected mw* servers that are designated canaries.

It works via being member of the new group "mwdebuggers".

[mw2271:~] $ id daniel
uid=545(daniel) gid=500(wikidev) groups=500(wikidev),705(deployment),839(mwdebuggers)
[mw2271:~] $ sudo cat /etc/sudoers.d/mwdebuggers 
# This file is managed by Puppet!

%mwdebuggers ALL = (ALL) NOPASSWD: /usr/local/sbin/disable-puppet*, /usr/local/sbin/enable-puppet*

The commands to use are sudo disable-puppet 'message' and sudo enable-puppet 'message'.

Note the message text can be any string but you need to use the same one you used to disable it to enable it again.

Please don't forget to re-enable puppet after making tests.

Feel free to share the news with other debuggers. Hope this is helpful.. per our chat a couple months ago. Cheers

Dzahn claimed this task.

Change 883562 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] openldap: Add mwdebuggers to cross-validate-accounts

https://gerrit.wikimedia.org/r/883562

FYI this created warnings in cross-validate-accounts, CR incoming.

FYI this created warnings in cross-validate-accounts, CR incoming.

I made https://gerrit.wikimedia.org/r/c/operations/puppet/+/883500 earlier, review welcome :-)

Change 883562 abandoned by Clément Goubert:

[operations/puppet@production] openldap: Add mwdebuggers to cross-validate-accounts

Reason:

Duplicate 883500

https://gerrit.wikimedia.org/r/883562

FYI this created warnings in cross-validate-accounts, CR incoming.

I made https://gerrit.wikimedia.org/r/c/operations/puppet/+/883500 earlier, review welcome :-)

Reviewed ;)