== Problem ==
As of this writing, one of the main reasons Toolforge tool developers keep using GridEngine vs Kubernetes is because our current k8s setup doesn't support mixing runtime environments. A tool that uses both java & python can only run in the grid. In Kubernetes we provide a concrete list of container images with fixed runtime environments (for example, python, nodejs, php, java, etc).
In the past, it was decided that a buildpack-based approach was the right solution to this problem. However, that project is a technical challenge, complex and requires non trivial amount of engineering work. The result is that the project is not ready to go yet and is not expected to be available at least until **TODO: when?**.
There is, however, another potential approach to unblock this situation in the short term: enable Bring Your Own Container (BYOC), while the buildpacks project is completed. This means allowing Toolforge developers to create kubernetes workloads using containers images created by them.
=== Constraints and risks ===
The fact that we disallow BYOC is mostly documented in a single place, this [[https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Docker_Images | wikitech page]], which reads:
```
We restrict only running images from the Tools Docker registry, which is available publicly (and inside tools) at docker-registry.tools.wmflabs.org. This is for the following purposes:
1. Making it easy to enforce our Open Source Code only guideline
2. Make it easy to do security updates when necessary (just rebuild all the containers & redeploy)
3. Faster deploys, since this is in the same network (vs dockerhub, which is retreived over the internet)
4. Access control is provided totally by us, less dependent on dockerhub
5. Provide required LDAP configuration, so tools running inside the container are properly integrated in the Toolforge environment
This is enforced with a K8S Admission Controller, called RegistryEnforcer. It enforces that all containers come from docker-registry.tools.wmflabs.org, including the Pause container.
```
Any decision taken in this topic should consider those five points.
In particular, one could argue that:
1. we don't have any active scanning of software inside containers. Claiming that our users comply with the open-source-code-only policy because we control the base container image is a bit naive.
2. in practice, we don't do active security maintenance on our container images today.
3. deployment speed is a good point, but mostly relevant for tools that redeploy constantly. If we detected this was a problem, we could open our already present docker registry for tool users to cache their images in there
4. is not clear what access control means in this point, or what specific needs we have.
5. The LDAP configuration is important, so if we enabled any form of BYOC then clear instructions should be provided for our users to build their container images using a base layer of our own. Otherwise their tools may not work as expected.
== Decision record ==
TBD.
`https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/TBD`
== Options ==
=== Option 1 ===
Enable BYOC. This enables a new workflow/usecase in Toolforge.
The simpler implementation of this option consists on:
* disabling our custom kubernetes registry admission controller
* create some docs for our users on how to effectively benefit from the new feature.
* communicate with our users.
What to do with BYOC if and when buildpacks are ready to go is left for a future decision process. In particular, enabling BYOC **does not** prevent the buildpack project from being completed/implemented.
Pros:
* Less dependency on the grid.
* Less dependency on NFS (users could just deploy their code in the container, and so we have one less dependency on NFS).
* Easy to implement.
Cons:
* Enabling a new feature may mean supporting this new feature forever? If so, see option 2.
...
=== Option 2 ===
Enable BYOC on a ***temporal fashion***. This enables a new workflow/use-case in Toolforge but only during the period of time where the buildpacks project is not completed, with the sole purpose of helping people migrate their tools away from GridEngine into Kubernetes.
The simpler implementation of this option consists on:
* disabling our custom Kubernetes registry admission controller
* create some docs for our users on how to effectively benefit from the new feature.
* clearly communicate with our users, with a focus on the temporal fashion of the new feature.
Pros:
* Less dependency on the grid.
* Easy to implement.
Cons:
* Given the temporal fashion, users may choose to don't adopt the solution.
...
=== Option 3 ===
Leave BYOC disabled (discard this request). Hope that the buildpack project completes soon.
Pros:
* TBD
Cons:
* TBD
...