This could be:
- A new database (would need some kind of store, or be in trove)
- As a custom resource in k8s (that would be using k8s/etcd as database)
- This should be read-only for users, as we want to only modify it through the API (that way we don't need admission controllers or controllers at all)
More questions and likely answers (feel free to edit the below section if you have other opinions or ideas)
- which database to use?
- k8s etcd (by defining a custom resource), for example (modified from https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/):
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: # name must match the spec fields below, and be in the form: <plural>.<group> name: toolforge-scheduled-job.jobs-api.toolforge.org spec: # group name to use for REST API: /apis/<group>/<version> group: jobs-api.toolforge.org # list of versions supported by this CustomResourceDefinition versions: - name: v1 # Each version can be enabled/disabled by Served flag. served: true # One and only one version must be marked as the storage version. storage: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: cmd: type: string cpu: type: string ... # either Namespaced or Cluster scope: Namespaced names: # plural name to be used in the URL: /apis/<group>/<version>/<plural> plural: toolforge-scheduled-jobs # singular name to be used as an alias on the CLI and for display singular: toolforge-scheduled-job # kind is normally the CamelCased singular type. Your resource manifests use this. kind: ToolforgeScheduledJob # shortNames allow shorter string to match your resource on the CLI shortNames: - tsj
- in what namespace should the database be?
- Each tool's namespace
For example, inside tool-tf-test namespace there would be a bunch of ToolforgeScheduledJob resources defining each scheduled job.
- what are we putting in this database?
- All the information needed to rebuild the user's jobs if needed (that means the stuff we keep in labels plus anything else needed to start that job, it does not include the status, if it's running/stopped/etc. for example)
- (component config) - dc: @Raymond_Ndibe what do you mean with this?
- possible paths? -- dc: @Raymond_Ndibe what do you mean paths? The paths on the API side don't change.
- /toolforge/<tool-name>/jobs-api/<job-name>/name {'unique-id' 'created-by' 'version' 'type' 'name' 'imagename' 'cmd' 'emails' 'retry' 'mount' 'continuous' 'filelog'} (or some variations of this)
- (/toolforge/<tool-name>/components/config {<component config goes here>})
- what do we do in case of path migration?
- if we decide to change the path from /toolforge/<tool-name>/jobs-api/<job-name>/name to say /toolforge/<tool-name>/component/<component-name>/jobs-api/<job-name>/name, we'll need to manually write a script that will migrate the paths and data
- how do we ensure that the database entries are in sync with the kubernetes objects they represent?
- we don't. If someone manually makes changes the underlying kubernetes objects of a job, something will probably go wrong. For this reason the database should only be editable by the apis otherwise should be readonly (haven't thought about how to enforce that yet). If I am not mistaken this is also the current situation rn, give or take