Page MenuHomePhabricator

Access group for Gitlab contractors
Closed, ResolvedPublic

Description

We have engaged an outside company for our Gitlab implementation. The contractors will need full root access to a small number of new Ganeti servers. The contractors will work outside of the operations/puppet repo for the next 6 months, while we are hiring new SREs to take over and rework the implementation as an immediate follow-up step.

Please create an access group for these contractors.

SRE Sponsor: Wolfgang Kandek
Value: faster access to a Gitlab production integrated setup, which allows for storage of PII and integrated testing
Risks: non WMF employees with access to the production network, non reproducible setup

Event Timeline

Ack, this needs approval/discussion in the next SRE meeting since it would create a new access group.

Change 664902 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admin: create new group for gitlab-roots

https://gerrit.wikimedia.org/r/664902

As @MoritzMuehlenhoff says this will need to approval in the meeting, but let me add my counter arguments against this:

  • There is no precedent for this. We 've never had anything unpuppetized (and when I say puppetized, I mean operations/puppet.git repo, not generic use of the puppet software) in production where a number of non WMF related individuals had full root access.
  • There are many good reasons for the above. It's a problematic path as it makes it impossible to reason/reproduce/replicate and work with the resulting infrastructure. The resulting infrastructure is a unique occurence that can not be reproduced/replicated even by the original contributor in any reasonable amount of time and effort. That will hamper any efforts of upgrading, providing High Availability and maintaining.
  • There will be no monitoring. As our monitoring/alerting is completely defined in operations/puppet, not using it would mean no monitoring. Creating a dedicated monitoring infrastructure just for this is probably way too much, probably redundant, work.
  • There will be no backups. Our backups infrastructure is done via operations/puppet, not using it means no backups. Again, creating a dedicated backups infrastructure just for this is probably way too much, probably redundant, work
  • It’s not security monitored by our tooling and security teams. That is, our puppetization sets all the needed things for knowing what software is installed in our systems and what software needs to be updated. Not having that, means no security updates.
  • There is no way to test changes to the environment. Granted, puppet testing is a very difficult thing, but there is one huge advantage and that is that we can always rollback a puppet patch and revert quickly, which emulates a form of testing.
  • Having PII in such an environment (as the task says) means a high probability for above said PII to leak. Let me elaborate please. The above bullet points point out why. With no security updates, no monitoring, no way to test changes and no way for others than the original contributor(s) to review what is going on the risk of that happening increases exponentially.
  • As such, the setup WILL not be integrated to production, but a unique occurance on the side, with access to production however, essentially endangering it instead of integrating with it.
  • The risks are mentioned but I fail to see a stated plan to address/mitigate them is present in the task description.

The contractors will work outside of the operations/puppet repo for the next 6 months, while we are hiring new SREs to take over and rework the implementation

So we are hiring people to do something, only to later hire someone else and do it properly? Why don't we hire people to do it rightly in the first place (even if contractors)? My previous google summer of code mentoree (intern with barely no work experience), in weeks had almost full understanding of our infrastructure code and was able to easily build on top of a less than ideal codebase (despite not having any special grants). If "working without affecting regular Puppet workflow/full root" is an issue, why not working on a separate branch/repo?

I don't know if this is considered but singing {L3} is a requirement for access to production and that contract it's explicitly mentioned that doing non-puppetized work outside of home directory is forbidden:

Anything that changes the state of a server outside of your home directory should be done by puppet. This means if a file is moved, touched, or updated that is not in your own user directory, it should be done via puppet manifests, not manual commands. This allows it to be peer reviewed.

Meaning such work would actively breach their contract.

Of course, I'm not a lawyer and I'm really sorry if I'm missing something obvious here.

Change 664902 merged by Dzahn:
[operations/puppet@production] admin: create new group for gitlab-roots

https://gerrit.wikimedia.org/r/664902

Hello, I would like a response to my concerns above. There's nothing against gitlab. The reason I'm asking is that I have been working on upgrading to mailman3 (T52864: Upgrade GNU Mailman from 2.1 to Mailman3) and before even asking to have a test VM in production I have been already puppetizing it for months now (T256536: Puppetize mailman3) and event went through the whole process of making standalone puppetmaster and it's quite a challenge to find SREs willing to review my puppet patches and I've reached the point of pinging at least five different people privately for review and it doesn't feel good to see the whole process (being there for a good reason) being bypassed.

The group has been created. Depending how you want to define this ticket, it is resolved now.

Just that the group is empty. But to add people to it you'll need access requests with real identities and keys.

I am closing this as resolved simply because the group exists now and is applied on the VMs requested in T274459.

I can understand Ladsgroup's frustration about bypassing the process but it's outside of my powers.

The issue with the L3 incompatibility will have to be handled when individuals get added to the group.

debt edited projects, added GitLab; removed GitLab (Initialization).