We're moving toward a process where new SREs don't get global root immediately. That's an important element of being able to confidently hire people who are earlier in their careers, but it makes it hard for new SREs to learn hands-on in production, because currently you need root to do most SRE work.
As a training tool, we would benefit from being able to pair a new SRE with an experienced buddy:
- Both SREs SSH to the same host.
- The new SRE (who can't use sudo directly) runs some other very sudo-like command, giving the command they want to run as root.
- The experienced SRE approves the command, triggering it to actually run as root (or, if it's wrong, they decline and the new person fixes it and tries again).
The advantage is that the new SRE is much more actively involved than just watching on a screenshare, but still can't make mistakes quite as impactful as if they had root.
Note that defending against intentional attacks by the new SRE is out of scope. They could probably trick their buddy into approving a command that does something unexpected, but that doesn't mean this system is defective: we'll only give this access to people we trust with it, as we do now with root. The goal is to provide a guardrail against mistakes, not malfeasance.
Originally I thought we would need to build this tool ourselves, but @CDanis found sudo_pair which was open-sourced by Square and is listed as a sudo plugin. I haven't dug into the implementation at all, but the description looks promising, as does the demo where all the output is shown to the buddy, with a killswitch -- so something like "sudo bash" is still supervised.
(For clarity: It sounds like in Square's use of sudo_pair, they require pairing for all SREs except in emergencies. I don't propose we do that here -- I only want it as a training tool.)
Can Infrastructure Foundations evaluate whether and how we can run this in prod? And if we can't, can we investigate building a solution of our own? I'm happy to consult and would love to be involved, but I don't have the domain expertise to configure this safely on my own.