Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +9 -2 | expanding sc-admins rights and members |
Related Objects
Event Timeline
As a personal note: Marko was granted the right to run/disable/manage puppet because he is performing non-emergency coverage and regularly doing deployments; we don't really need "emergency coverage" and I have seen in the past repeated abuses of the ability to disable puppet in production.
If the focus is evenly distribute the load of managing deployments between different people in the team, we can discuss this, but emergency coverage is not really the reason marko was given the right to manage puppet in the first place.
I am currently the default go-to guy when it comes to SC* services. This is becoming a bottleneck and, more generally, is not a sustainable solution. With the creation of the sc-admins group, I think it makes sense to expand it to:
- allow all users in that group to manage services on the hosts (basically sudo service *)
- add the services team's members to it
This would allow other members of the team not only to participate in emergency situations in order to fix breakages, but also take share in day-to-day duties that come with it (configs, cleanups, investigations etc).
An important thing to note here is that currently @Eevans and @Pchelolo don't have any kind of access to SC*, which I don't think is a good state to be in.
@mobrovac I agree in principle; also I guess the "puppet disabling" will not be needed anymore once we move every service fully to scap3?
Having a lot of people able to disable puppet for long stretches while doing testing is what slightly worries me, out of sour experiences we've had in quite a few cases.
If by fully you mean that also config deploys are done via Scap3 then, yes, I don't see how/why would Puppet be needed in this context.
Having a lot of people able to disable puppet for long stretches while doing testing is what slightly worries me, out of sour experiences we've had in quite a few cases.
I can relate to your concern, but I think we should put this into perspective: it is unlikely that it will be feasible to have puppet disabled for extended periods of time on SC*, simply because of the number of services running there. I think we can all agree that disabling Puppet for slowly getting out potentially-harmful changes and/or testing parameter changes that are known only to affect production (examples of which might be proxy config, rate limiting, etc) up to a couple of hours, i.e. while working on them, is OK, but that disabling Puppet and leaving it disabled because it works like this now, so let's leave it is not.
Change 290491 had a related patch set uploaded (by RobH):
expanding sc-admins rights and members
I got the service names for inclusion from @mobrovac (as it seems Gabriel is out today, and there isn't a need to stall this just for his input when Marko knows what was needed!) Also Marko confirmed that there weren't any missing service team members from the list.
As this was already approved in the meeting, and I have both my review and @Joe's review on the patch, I've merged it live.
Please note that while this is live, it is up to the services team to coordinate service issues and administration with operations. (Basically everyone who just got rights should continue to do what @mobrovac is already doing with these sc-admin rights, work with ops.)
Thanks!