Feature summary (what you would like to be able to do and where):
From within a kubernetes pod, interacting via the API gateway (to e.g. the jobs api) using the tool credentials.
The equivalent from within the account on a bastion host would be something like
from toolforge_weld.api_client import ToolforgeClient
from toolforge_weld.config import load_config
from toolforge_weld.kubernetes_config import Kubeconfig
def main():
config = load_config("test-job")
client = ToolforgeClient(server=config.api_gateway.url, kubeconfig=Kubeconfig.load(), user_agent="Example Job")
print(client.get("/jobs/v1/tool/cluebotng-trainer/jobs/"))
if __name__ == '__main__':
main()This however doesn't work nicely from within a pod:
(1)
Context: Within a job, no NFS mount
Fails: `Kubeconfig.load() due to no kube config file
Fails: Kubeconfig.from_container_service_account() due to invalid SSL cert & 403 while using service account
(2)
Context: Within a build service, with NFS mount
Fails: `Kubeconfig.load() due to kube config file not found (HOME = /workspace, no checking of TOOL_DATA_DIR for file)
Additionally, there is no environment variable for the tool name/user name/namespace, which makes using the functions/api clunky (we can parse the name out of $TOOL_DATA_DIR, but should that be set when there is no nfs mount?)
Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
I have a job (https://github.com/cluebotng/trainer) which steps through a sequence of steps for a number of jobs, each "step" gets a clean environment (with artefacts from a previous "step").
You can imagine this is similar to what something like Airflow would provide with a DAG based workflow model:
cbng-trainer run-edit-sets
`-> creates coord-xxx job via jobs api
`-> creates xx-download job
`-> creates xx-train job
`-> creates xx-trial job
`-> creates coord-xxx job via jobs api
`-> creates xx-download job
`-> creates xx-train job
`-> creates xx-trial job(The coordination jobs are created via jobs and the steps directly in kubernetes, however the goal is to do everything via the jobs api to be able to use e.g. logs).
The results of each "step" is stored under https://cluebotng-trainer.toolforge.org (providing very basic "object store") for later usage.
The api calls are constructed using toolforge_weld along the following lines:
def _client_config():
config = load_config("cluebotng-trainer")
return ToolforgeClient(
server=f"{config.api_gateway.url}",
kubeconfig=Kubeconfig.load(),
user_agent="ClueBot NG Trainer",
)To get a (minimal) kubernetes config (for Kubeconfig.load()), without relying on NFS, we have a wrapper script that writes out the file from envvars.
Relevant code snippet:
if [ ! -f "$HOME/.kube/config" ];
then
mkdir -p /workspace/.kube
echo "$K8S_CLIENT_CRT" > /workspace/.kube/client.crt
echo "$K8S_CLIENT_KEY" > /workspace/.kube/client.key
cat > /workspace/.kube/config <<EOF
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: ${K8S_SERVER}
name: toolforge
contexts:
- context:
cluster: toolforge
namespace: tool-cluebotng-trainer
user: tf-cluebotng-trainer
name: toolforge
current-context: toolforge
kind: Config
users:
- name: tf-cluebotng-trainer
user:
client-certificate: /workspace/.kube/client.crt
client-key: /workspace/.kube/client.key
EOF
export KUBECONFIG="/workspace/.kube/config"
fiThis is a bit clunky as it requires the client crt/key to be loaded into envvars and will break when the credentials are created.
Thus this request falls into 2 parts:
(1)
In addition to https://wikitech.wikimedia.org/wiki/Help:Toolforge/Envvars#Globally_set_environment_variables can there be an environment variable set for the tool name (suitable for using to construct either the username/namespace or using to query tool data e.g. via the jobs api).
(2)
Expose credentials that can be used to access the internal APIs (e.g. jobs), this could either be via the service account, or the same client cert/key that is written to NFS also be loaded into the envvars (similar to the database credentials).
"Normally" this would be done via the service account or ephemeral credentials, but given the exposure of the home dir via NFS and database credentials via envvars, re-using the tool credentials seems acceptable.
On the flip side, allowing the service account would safe effort of having to maintain the envvar entries.
Conceptually something similar to this should work within a clean (Python + toolforge_weld) pod
import os
from toolforge_weld.api_client import ToolforgeClient
from toolforge_weld.config import load_config
from toolforge_weld.kubernetes_config import Kubeconfig
tool_name = os.environ.get('TOOL_NAME')
config = load_config(tool_name)
client = ToolforgeClient(
server=f"{config.api_gateway.url}",
kubeconfig=Kubeconfig.from_container_service_account(namespace=f'tool-{tool_name}'),
user_agent="ClueBot NG Trainer",
)
print(client.get(f"/jobs/v1/tool/{tool_name}/jobs/"))(Today this will fails the ssl cert check + return a 403 with SSL verification turned of)
Benefits (why should this be implemented?):
This would make scheduling of pods via the "supported method" significantly easier for users.
It would enable more flexible usage of toolforge jobs, supporting more diverse workloads.
"Native" support reduces the overhead and complexity of needing to manage duplicating credentials on the maintainers end.