We would like to use the GitLab Trusted Runners for building spark artifacts please.
At the moment we have a build pipeline for spark in the production-images repository. It currently builds and publishes three docker images:
- A spark-build image, which contains the build state
- A spark image, which contains a binary spark distribution that is created from spark-build
- A spark-operator image, which contains the same as spark-image, wih an added go binary to
However, we need to expand on the capabilities of this pipeline by doing the following:
- Building for multiple minor versions of spark (currently, 3.1 to 3.4)
- Adding a debian packaging component to help deploy the various different versions of the spark-3.x-yarn-shuffle jar file to our production hosts
I have started work on migrating the spark build pipeline to GitLab-CI in order to achieve these goals.
https://gitlab.wikimedia.org/repos/data-engineering/spark/
However, I won't be able to publish the images until such time as I can use a trusted runner.
In addition to this, I am hitting a 20 minute execution timeout in the wmcs based runners at the moment, since each spark build takes a long time to run.
I would therefore like to be able to access the trusted runners so that we can get past both of these blockers.