Page MenuHomePhabricator

Security Readiness Reviews of Trusted GitLab Runners
Closed, ResolvedPublic

Description

Basic Information Section

This task tracks a Security Review of a subset of GitLab CI workers. This CI workers are inside WMF infrastructure and are called "Trusted Runners".

The Security Review will happen on the test instance running in WMCS: https://gitlab.devtools.wmcloud.org. The test instance has a mockup of shared and trusted Runners, similiar to the production instance. However all Runners available for the test instance run in WMCS too.

The project /repos/runner-test-project should be used for testing and evaluation of the Runner Security and configuration.

Brief description

GitLab Runner offer CI capabilities, so arbitrary code can be submitted as a CI job. This code gets executed on GitLab Runners. This jobs have multiple tiers of trust and security. Some jobs will only do linting or testing, whereas other jobs produce builds and artifacts running in production. This concept was mirrored in multiple tiers of Runners.

A set of Shared Runners was created for less critical CI jobs. This Shared Runners should not be part of this Security Review.

Jobs with more critical CI jobs will run on the Trusted Runners. To increase the trust and reliability of certain CI jobs, two GitLab Runner were created inside WMF infrastructure. This also means that a this Trusted Runners could reach a wide range of WMF production infrastructure, if getting compromised. So access to this Runners was restricted and additional security measures were implemented. Most of this can be found at security related documentation of Trusted Runners.

Do you have a project/product/program plan or documentation?

GitLab Runner overview documentation
Trusted Runner documentation
Security related documentation of Trusted Runner
Task for setting up the Trusted Runners: T295481

Documentations will change the next few weeks as additional features get implemented.

Primary Contacts

What Security Team services do you anticipate needing?

Security Readiness Reviews

What is the 'go live' date for deployment of this project

2-3 months

Privacy Information Section

Will any sensitive data to be collected, stored or exposed?

Certain CI jobs will need credentials to access infrastructure. So sensitive data like the following can be expected:

  • tokens/certificates to access Kubernetes
  • keys/tokens to access other WMF machines (like apt repo, helm chart museum, docker registry)
  • tokens to access other infrastructure (WMCS, public clouds)
  • passwords for technical users/logins
  • keys to sign packages

Technical Information Section

Do related discussions exist in Phab, on wiki, or in an RFC'?

Task for setting up the Trusted Runners: T295481

Technology Stack

The current Runner cluster consist of two WMCS VMS:

gitlab-runner-1002.devtools.eqiad1.wikimedia.cloud (shared, non-trusted Runner)
gitlab-runner-1003.devtools.eqiad1.wikimedia.cloud (trusted Runner)

Dedicated bare metal hosts are expected be deployed in Q4 FY2021/22 (pending hardware delivery)

Trusted Runners use puppet code for (role(gitlab_runner)).

The Runners use the gitlab-runner (see) executable which is written in golang. The Runners have a Docker environment to execute all CI jobs inside a separated Docker container. Runners also have Prometheus metrics exporter enabled.

Security Readiness Review Section

  • Below is only relevant if this Project has reached maturity and requires a Readiness review.
  • You can fill this in later if you are still in the Preview or other early phases :)

Code

Puppet configuration:
GitLab Runner modules: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/gitlab_runner/
GitLab Runner profiles: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/gitlab/runner.pp
GitLab Runner role: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/role/manifests/gitlab_runner.pp
GitLab Runner hiera data: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/role/common/gitlab_runner.yaml

Dependencies:
https://gitlab.devtools.wmcloud.org

Post-deployment

  • Name of team responsible for project support post-deployment and primary contact(s).

ServiceOps

Working test environment

There is a GitLab test instance at https://gitlab.devtools.wmcloud.org/explore however this instance has no Runners. It would be possible to create additional test Runners, but the setup is a bit different because the test instance is running in WMCS/VPS.

It would be possible to block one Trusted Runner for security team during the review. It would also be possible to assign this Runner to the GitLab replica if certain tests could compromise the production instance. The replica can be found here https://gitlab-replica.wikimedia.org/ and is restored every 24 hours.

Event Timeline

sbassett subscribed.

Thanks for submitting this, @Jelto. I've moved it over to our appsec reviews queue for triage. This might end up being a decline though. Our team isn't really resourced to perform reviews of massive enterprise applications like Gitlab, or even large, critical features of them. We also typically do not perform appsec review of puppet/config code as the subject matter expertise for much of that lives within SRE. However, this might be a good candidate for an external/vendor review.

However, this might be a good candidate for an external/vendor review.

@sbassett If we were to go down that path, do we have a standard process and/or recommended vendors?

@sbassett If we were to go down that path, do we have a standard process and/or recommended vendors?

Hey @LSobanski - I'm not certain there is a specific process for this, other than setting up some time to discuss with the Security-Team. We certainly have a few vendors that our team has worked with in the past, and who are already vetted in Coupa, which we could potentially recommend for a review like this. Funding is a different matter though, as our team obviously cannot fund every request like this that we receive. I think it would also make sense to determine what a review like this needs to accomplish since, again, the application security team does not typically review puppet/config code or massive, enterprise, third-party applications like Gitlab.

Thanks for the extra details. To summarize, we are waiting for a result of the appsec triage and we'll take appropriate next steps based on that.

Thanks for the extra details. To summarize, we are waiting for a result of the appsec triage and we'll take appropriate next steps based on that.

Sorry for any confusion, but I don't think we'll be triaging this in any way, at least not for this quarter. I think we're happy to have a consultation though to determine some potential paths forward for this kind of review.

Hey @Jelto @LSobanski - I believe this review is to be scheduled with an external vendor at some point within the near future. @Mstyles should have more information for you about this engagement soon. Thanks.

We're planning to get this project vendor reviewed and will comment on here when we have clearer dates for that

The security firm is aware of this assessment and we are in the scheduling stages. Assessment kickoff should be happening in the next couple of weeks.

I added a project /repos/runners-test-project for further testing of the Runner security review. This project should be used for tests and evaluation of the implementation. runner-test-project is inside of the repos group. Similar to production, all project in that group have a shared Group Runner available. Furthermore a Trusted Runner was added to the project as well. So in summary the test instance has two Runners available:

gitlab-runner-1002.devtools.eqiad1.wikimedia.cloud (shared, non-trusted Runner)
gitlab-runner-1003.devtools.eqiad1.wikimedia.cloud (like a trusted Runner, but also in WMCS)

I promoted one of the new accounts Runner1kmhcorp to maintainer for repos group. So this account is allowed to merge and review changes and to run code on the Trusted Runners. Runner2kmhcorp is considered untrusted and is not allowed to execute CI jobs on the Trusted Runner, only on the Shared Runner.

Trusted Runners are blocked on all instances by networking issues T311241 currently. I can take a look at that next week if that's still blocking.

The Trusted Runner mock-up on the test instance is still blocked by T311241. Jobs are failing with Could not resolve host.

I added a patch which hopefully fixes this issue and unblocks the Trusted Runner mockup.

still blocked by T311241, hoping to have an update this week

still blocked by T311241, hoping to have an update this week

As mentioned in T311241#8097262 this should be unblocked now. A workaround for DNS issues for GitLab Runners on the test instance was found.

Update: pentest complete, to review with team this week (9/6/2022)

sbassett changed the task status from Open to In Progress.Sep 6 2022, 4:56 PM

There is only one issue in the report that is something to fix. The informational ones are all things they tried but could NOT do, which is good:)

Planning to talk about this and a fix during our summit.

sbassett triaged this task as Medium priority.Sep 6 2022, 6:46 PM
sbassett removed a project: RFS.
Mstyles moved this task from In Progress to Our Part Is Done on the secscrum board.

The summit was held and the follow up task has been created, so marking this as resolved. We plan to do further testing in the future once the docker in docker (or other container solution) has been implemented and that will go in a separate ticket.

The summit is happening in about 2 weeks.

Jelto changed the status of subtask Restricted Task from Open to Stalled.Nov 4 2022, 1:37 PM
Jelto closed subtask Restricted Task as Resolved.Nov 17 2022, 3:17 PM

Do we want to keep this task protected for any reason? The only reasons I can think of would be if there were any serious security issues mentioned here which were not ultimately mitigated or WP:BEANS, though the latter is not a compelling argument IMO.

Do we want to keep this task protected for any reason? The only reasons I can think of would be if there were any serious security issues mentioned here which were not ultimately mitigated or WP:BEANS, though the latter is not a compelling argument IMO.

I don't have concerns removing the protection.

sbassett changed the edit policy from "Custom Policy" to "All Users".Nov 18 2022, 2:57 PM
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".