Page MenuHomePhabricator

Trusted gitlab runner containers need access to staging k8s cluster
Closed, ResolvedPublic

Description

CI jobs running on trusted runners need to be able to connect to the kubestagemaster cluster in order to perform test helm deployments of mediawiki services before promotion to production (only in the ci namespace). See https://integration.wikimedia.org/ci/job/mathoid-pipeline-rehearse/38/console for an example of how the existing Jenkins CI performs this operation.

https://gitlab.wikimedia.org/repos/releng/mathoid/-/jobs/42458 shows that a CI job running on gitlab-runner1003.eqiad.wmnet (a trusted runner) cannot make a TCP connection to kubestagemaster.svc.eqiad.wmnet:6443. However the same curl command from the runner host directly does work.

Event Timeline

dancy renamed this task from Trusted gitlab runners need access to kubestagemaster k8s cluster to Trusted gitlab runner containers need access to kubestagemaster k8s cluster.Dec 16 2022, 7:38 PM
dancy updated the task description. (Show Details)

Change 868737 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] profile::gitlab::runner::allowed_services: Add kubestagemaster

https://gerrit.wikimedia.org/r/868737

Jelto added subscribers: JMeybohm, akosiaris, Joe, Jelto.

Thanks for opening the task!

As far as I know direct access from CI workers to the Kubernetes API is a new feature and not available in Jenkins and Gerrit. All Kubernetes deployments happening on deployment host at the moment. So I have some questions about

Is the access read-only (like generating helm diffs) or do you also want to deploy new versions to the staging wikikube cluster?
Is access limited to certain namespaces? (like mathoid)
Is the access needed only for staging?
How do we manage access and kubeconfig files? Most probably we don't want to copy and paste kubeconfig files from deployment host to GitLab CI? (@serviceops)
Is it reasonable to move kubeconfig credentials for staging wikikube to Trusted Runners/GitLab CI pipelines? (@serviceops)

I'm looping in Serviceops folks/team to get some feedback on that too.

taavi renamed this task from Trusted gitlab runner containers need access to kubestagemaster k8s cluster to Trusted gitlab runner containers need access to staging k8s cluster.Dec 19 2022, 11:10 AM
taavi added a project: Kubernetes.

As far as I know direct access from CI workers to the Kubernetes API is a new feature and not available in Jenkins and Gerrit.

Here's an example from Jenkins:
https://integration.wikimedia.org/ci/job/mathoid-pipeline-rehearse/38/console
I updated the task description with this info too.

So I have some questions about

Is the access read-only (like generating helm diffs) or do you also want to deploy new versions to the staging wikikube cluster?

We need to deploy new versions to the staging wikikube cluster.

Is access limited to certain namespaces? (like mathoid)

Only the ci namespace. I updated the task description.

Is the access needed only for staging?

Yes.

We already had that functionality for the deployment pipeline on gerrit, it should be around for Gitlab too. As far as the kubeconfig question goes, it is reasonable to have it populated on Truster Runners (that's what we currently do anyway with contint* hosts).

Ah sorry for the confusion, I missed that! Sounds good to recreate all of the functionality in the ci namespace. I was thinking about continuous deployment replacing the deploy host. After the code-freeze we can continue with https://gerrit.wikimedia.org/r/q/868737.

@dancy What is your plan of providing the kubeconfig? Is mathoid the only project using the deployment pipeline? If not, is it reasonable to add this kubeconfig to every GitLab project as a protected CI variable?

Ah sorry for the confusion, I missed that!

No worries!

I was thinking about continuous deployment replacing the deploy host.

That's something that, mostly due to our very open and transparent nature, is pretty difficult to happen. It's not inconceivable, but it would require investing into both a lot more tooling as well as heavy cultural changes. All of this is to say, that continuous deployment, if it happens, isn't going to happen suddenly, but rather as a result of a long process.

@dancy What is your plan of providing the kubeconfig? Is mathoid the only project using the deployment pipeline? If not, is it reasonable to add this kubeconfig to every GitLab project as a protected CI variable?

There are several other projects like mathoid that follow the same pattern. It was my expectation that we'd add a GitLab protected variable to an appropriate group to hold the secret for accessing the ci namespace as the jenkins user.

Jelto triaged this task as Medium priority.Dec 22 2022, 3:10 PM
Jelto moved this task from Incoming to Work in Progress on the collaboration-services board.

Change 868737 merged by Dzahn:

[operations/puppet@production] profile::gitlab::runner::allowed_services: Add kubestagemaster

https://gerrit.wikimedia.org/r/868737

Mentioned in SAL (#wikimedia-operations) [2023-01-03T21:53:04Z] <mutante> gitlab-runner* - allowing kubestagemaster.svc.eqiad.wmnet to connect to port 6443, run puppet via cumin, deploy gerrit:868737 - T325385

Mentioned in SAL (#wikimedia-operations) [2023-01-03T21:55:27Z] <mutante> gitlab-runner* - correction: allowing connections TO kubestagemaster.svc.eqiad.wmnet port 6443 FROM trusted runners, of course - T325385

I verified today that trusted runners can now complete a network connection to kubestagemaster.svc.eqiad.wmnet:6443 so that part of things is unblocked. Thanks everyone for your help!

@dancy What is your plan of providing the kubeconfig? Is mathoid the only project using the deployment pipeline? If not, is it reasonable to add this kubeconfig to every GitLab project as a protected CI variable?

There are several other projects like mathoid that follow the same pattern. It was my expectation that we'd add a GitLab protected variable to an appropriate group to hold the secret for accessing the ci namespace as the jenkins user.

I'd like a blessing from your team on my plan to use a GitLab protected variable to hold the secret. It will be stored in the repos/releng group for starters.

I verified today that trusted runners can now complete a network connection to kubestagemaster.svc.eqiad.wmnet:6443 so that part of things is unblocked. Thanks everyone for your help!

I'd like a blessing from your team on my plan to use a GitLab protected variable to hold the secret. It will be stored in the repos/releng group for starters.

Happy to hear that!

My plan when designing the Trusted Runners also was to use GitLab protected CI variables for that. I was thinking about adding this variables on a per-project level. But if you somehow curate the projects added to repos/releng I don't have concerns here. And as far as I can tell repos/releng is not open for everyone to create new projects?

OK. While I'm developing I'll use a project variable.

(Orthogonal but worth a discussion in our next meeting.)

Perhaps this is a good time to start looking at something like Hashicorp Vault as GitLab can use it for secret management using JWT—it was the canonical use case for their implementation of JWT. Integration with variables is a premium feature of GitLab. However, there's no reason we shouldn't be able to populate certain sensitive secrets and policies from operations/puppet into Vault ourselves and then make it available to runners for query using JWT.

LSobanski assigned this task to Dzahn.
LSobanski subscribed.

Resolving as it seems like the original request was addressed. If there's follow up discussion then let's create a separate task for that.