Page MenuHomePhabricator

Allow a shared, protected runner for the data-engineering group in GitLab
Closed, DeclinedPublic

Description

We would like to be able to create a new GitLab runner that would only be available for the Data-Engineering team's pojects: https://gitlab.wikimedia.org/data-engineering

I understand that at the moment it would take someone with Owner level access to the group to be able to access the runner settings and therefore register a new runner.
https://gitlab.wikimedia.org/groups/data-engineering/-/group_members
Would you be happy to grant me (as an SRE within the team) ownership of that group, so that I could perform this operation as a self-service task?

It might be helpful if the runner could use the docker executor, but if that isn't feasible then we would also be happy to use a runner with the shell executor.

The first project for which we would like to use this runner is data-engineering/airflow-dags.

This runner was first discussed here: T286958#7450771

Some of the tasks that this runner should be capable of achieving are:

  • accessing our current Airflow instances, which are in the Analytics VLAN.
  • accessing HDFS, which requires the use of Kerberos
  • uploading artefacts (Jars and python wheels) to Archiva

Therefore we would like to site a runner within the analytics VLAN, provide it with Kerberos keytabs as required, and begin to use this for both CI and automated code deployments.

There is a possibility of creating a 'test-analytics' runner first, which only has access to the test-hadoop cluster, before promoting this configuration to production.

Many thanks.

Event Timeline

mforns triaged this task as Medium priority.Nov 5 2021, 9:48 PM

Currently there are only Runners in WMCS. We first have to adapt this setup to also have a set of Runners in a trusted environment outside of WMCS. The task T295481 is used to create such runners. If we have secure Runners in place we can use this to create additional, special purpose runners (like this one fore data-engineering).

So this task is a little bit blocked until we have created such secure Runners. But I think some things could happen in parallel here, like figuring out the networking requirements with networking folks.

It might be helpful if the runner could use the docker executor, but if that isn't feasible then we would also be happy to use a runner with the shell executor.

From security perspective shell executors should not be used as they don't offer any real separation of jobs and from the Runner. So we should use at least the Docker Executors (see).

Thanks for the reply @Jelto,

Perhaps I or one of my team could help with the set up this particular secure runner, given that it is quite a specialized use case and that it's not going to be used for anything to do with Mediawiki deployments, or anything like that.

I note @brennen's comment here: T292094#7493614

For very specialized runners, I do suspect a bring-your-own approach is best.

So I'm more than happy to work with you on this specialized requirement, if that's feasible.

From security perspective shell executors should not be used as they don't offer any real separation of jobs and from the Runner. So we should use at least the Docker Executors (see).

The docker executor is fine with us too, it's the most convenient and flexible solution by far, but equally we'd be happy to manage the security issues around the shell executor as well, given that we would only be using it for trusted builds.

What's the best thing that I can do to help keep this from being blocked for too long? Should I write a document with a proposal for a data engineering runner setup that your team can review?

Pausing this task, since we are not currently working on it. I think that it would still be useful to have a meeting with the release engineering team to discuss whether or not this is a desired path forward.

Declining this task as we have no time to work on it at the moment.

BTullis removed a project: Data-Engineering-Kanban.
BTullis added a subscriber: Antoine_Quhen.

Reopening the task since it is still something that would be of value to the data-engineering team.

To illustrate, we currently have a requirement to build an airflow .deb package (T317210) and @Antoine_Quhen has developed a GitLab-CI pipeline that is defined in the same repo as our Airflow DAGS.

Unfortunately, we cannot yet make use of this pipeline because we can't execute docker within the context of the shared or trusted runners.

As a workaround, we have copied all of the steps from the Dockerfile into the .gitlab-ci.yml file in order to create a second pipeline.

So we were wondering whether it would be feasible for us to bring our own runner and register it with either the data-engineering group or the data-engineering/airflow-dags project, in order to allow us to use this feature.

Could we possibly discuss some of the deployment scenarios that might permit this please? I have a few questions to start with... Perhaps @Jelto would be well placed to advise us?

  • Is it right to say that we still can't use privileged mode for gitlab-runners in production, for security reasons?
  • Is the use of podman an option for us?
  • What about kaniko or buildah? Has any research been carried out into the use of these tools?

We're happy to deploy a machine to WMCS if that's the only way. However, it might limit some of the things we could use this runner for in future if it can't be made to run in the production realm, so I'd be keen to explore all options.

Would you be happy to grant me (as an SRE within the team) ownership of that group, so that I could perform this operation as a self-service task?

Confirming here you're an owner of the data-engineering group in gitlab

Would you be happy to grant me (as an SRE within the team) ownership of that group, so that I could perform this operation as a self-service task?

Confirming here you're an owner of the data-engineering group in gitlab

Thanks for that @thcipriani - Yes, I have the necessary rights now.
When I originally wrote this ticket, the data-engineering group hadn't yet been moved under the /repos top-level group and I didn't have the privileges at the time. I'll update the description to reflect the changes since then.

Just passing by to +1 this idea.

@BTullis mentioned above the airflow-dags project, but it would also immediately benefit the conda-analytics project, as we are effectively doing the same workaround of copying the Dockerfile into gitlab CI steps. See T321736 for details.

This is no longer necessary, due to the work undertaken by the Release-Engineering-Team on GitLab.
It is possible to gain access to trusted runners on a per-project basis.