Page MenuHomePhabricator

[jobs-api] expose jobs-api continuous jobs to the internet via `toolname.toolforge.org`, just like webservice
Open, In Progress, MediumPublic

Description

The idea is to make sure that one can manually set up a web service that is exposed to the internet via the toolname.toolforge.org domain, using toolforge jobs.
This is in preparation to moving toolforge webservice into toolforge jobs
( I have a feeling we already have a task for this but in case we don't yet, this is me creating it)

Things that needs to be done
  1. Implement this feature in jobs-api
    • Configure ingress, services, ensure nothing conflicts with the webservice
    • What should the domain name be? for now maybe toolname.toolforge.org (like webservice). In the future jobname.toolname.toolforge.org.
      • <tool>.toolforge.org
    • We may have to make some validation changes to webservice itself, to verify that the domain name toolname.toolforge.org is not in use by jobs-api (and verse versa)
  2. Implement the feature in jobs-cli
    • what's a suitable argument name (--expose, --expose-default ?)
  3. Testing (especially ensuring this doesn't affect how the webservice works)
  4. Cloud announce/toolforge changelog

To my knowledge this is likely all that is need to get this to work. We already had a brief discussion about this ( myself and @dcaro ) and didn't immediately see many other things apart from the above ( Thanks to the awesome work already done by so many others to handle the domain name, ssl and every other thing needed).

You are welcome to edit this and add any other thing that I missed. Please do 🙏, I don't know a lot about how many things outside toolforge work. If there is any resource you think will help, please drop.

Related Objects

StatusSubtypeAssignedTask
ResolvedLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
In Progressdcaro
Resolveddcaro
In Progresskomla
Resolveddcaro
Resolveddcaro
Opendcaro
OpenNone
In ProgressFeatureRaymond_Ndibe
In ProgressRaymond_Ndibe
InvalidRaymond_Ndibe

Event Timeline

Raymond_Ndibe renamed this task from [jobs-api] enable ingress for jobs-api continuous jobs to [jobs-api] expose jobs-api continuous jobs to the internet via `toolname.toolforge.org`, just like webservice.
Raymond_Ndibe changed the task status from Open to In Progress.Apr 25 2025, 3:23 AM

@taavi @bd808 @aborrero @dcaro
Mentioning the people I know who know a lot about how so many of our infra are configured. Please you are welcome to drop your opinion about this.

There could be significant architectural challenges depending on the implementation.

My suggestion would be, that for continuous jobs that want HTTP, then they should be using the normal ingress setup.

This is, imagine an user is running this:
toolforge jobs run my-custom-web --continuous [..] --expose http
I would just translate to the equivalent of:
toolforge webservice start

And with "translate" I mean, generate the exact same kubernetes resources as in the webservice case:

  • an ingress entry
  • a service resource
  • a deployment resource

This should be feasible.

Now, if we wanted to support a job exposing arbitrary TCP/UDP ports over the internet, that would be a bigger fish.
We would need to figure out a number of things:

  • somehow the external HAproxy would need to know which node ports are open on the worker nodes (all of them? all of them except HTTP?)
  • we would need an additional config flag for ingress-nginx, like --tcp-services-configmap (see https://kubernetes.github.io/ingress-nginx/user-guide/cli-arguments/#command-line-arguments)
  • when the toolforge jobs CLI is called, we would need to create:
    • the deployment for the continuous job itself
    • a nodePort service for this new TCP/UDP port, so kubernetes worker nodes listen on this new port
    • update the ingress-nginx configmap to introduce the new port
  • Most importantly: What happens if two different tools need to listen on the same port?

The system wont have any method for noticing that mytool.toolforge.org:123456 must be routed to a different pod than mytool2.toolforge.org:123456 (note same port, diff FQDN), because for arbitrary TCP/UDP connections there is no such thing as a Host: mytool.toolforge.org header that can be used for routing like we do for HTTP.

I don't feel there is a clear and significant request from the community to implement this, so I would just not put this in the roadmap for now.

Agreed that we can scope this to HTTPS only for now, arbitrary TCP traffic is a very different problem. Also for now this can be scoped to the $TOOL.toolforge.org domain only, doing anything else has a bunch of additional complexity for dealing with certs and validation and I think there's a neat way to do this in a forwards-compatible way.

One thing that I'd like to make possible is routing specific subpaths to different deployments. One possibility for that would be to:

  • Add a flag (say --https-path) to specify the path prefix to route to this service. The current behaviour where we route everything to a single service then translates to --https-path=/.
  • When creating a job that sets --https-path, generate an ingress object for the tool domain (name the object $TOOLNAME.toolforge.org for example) with a {pathType: Prefix, path: $HTTPS_PATH} route pointing towards the created Service.
  • If the ingress already exists, add the route to the existing Ingress object. If a route with the exact same path already exists, throw an error.
  • Similarly, when deleting a job, delete the route from the ingress object. If no routes remain, drop the Ingress object entirely.

For the transition we can add code in both webservice and the ingress-specific jobs-api code to error out when an ingress object with a different app.kubernetes.io/managed-by annotation exists. Then the transition from old to new becomes "delete webservice-managed thing, re-create jobs-api managed thing", and we can utilize the backend mechanism in webservice to do that.

And if/when we want to add support for different domains, at the jobs-api layer all we need to do is add a separate --https-host parameter that controls which different Ingress object is being modified for that specific job. A tool can then easily have different jobs using different hostnames and such.

Thoughts?

In the future jobname.toolname.toolforge.org.

I am having some trouble coming up with a good use case for this. I have been an advocate for narrow tools rather than expansive suites in the Toolforge space. I advocate for this largely because keeping a tool with a single focus working is assumed to be easier than keeping a large suite healthy over the longer term. When a developer loses interest in a narrow tool it should be easier to find people both willing and capable of taking it over. When a feature request comes in for a narrow tool it should be easier to decide if it is in scope or out of scope for the tool. The idea of N tools sharing the same tool account namespace but presenting as separate hosts seems to run counter to that narrow tool advocacy.

what's a suitable argument name (--expose, --expose-default ?)

What about --web or --webservice? Keep that as meaning "connect port 8000 of the primary container in this Pod with a Service and Ingress for $TOOL.toolforge.org" which is basically what webservice does today.

One thing that I'd like to make possible is routing specific subpaths to different deployments.

Can you help me think about this by providing a concrete use case? From a general purpose platform point of view I can describe the utility of an aggregating front router, but I'm not readily able to translate that into seeing a solid use case for that complexity in tools of the scope that I have worked with. The closest thing I can think of is providing an anonymizing proxy service in front of a 3rd party website which I helped one or two tools do with hand-built Ingress objects in the past.

Can you help me think about this by providing a concrete use case?

One use-case I see is when you want to have a webservice + a backend API, and serve the backend api from /api, this would avoid having to do it in the webservice code and move it to the ingress. Though I think it might not be a good practice as now the developer can't really reproduce locally without running their own ingress, or changing the webservice code to do the proxying that they were trying to avoid.

Generally speaking, would start small, keeping the "narrow" principle @bd808 mentioned.

First just implement the minimum to replace webservice, this is:

  • HTTPS only (I think that it was the idea already, so +1 on that)
  • Making the 8000/continuous job port the default (as in, use 8000 as the default port)
  • I like the more concise --webservice as a bundle of the others, like --expose --port=8000, maybe add it as an "extra" flag so users have an easy transition

Then for the subpath and subdomain features, either wait to see if users request them (if you have use cases @taavi please share!), or try to gather some input if they think it would be useful. Just make the current implementation so it does not make it very hard to implement if needed, and wait until there's actual demand to build it.

Now, if we wanted to support a job exposing arbitrary TCP/UDP ports over the internet, that would be a bigger fish.

Yeaa the idea was to do the things we currently have happening in the webservice on the jobs-api, so unless the webservice is doing anything different from just handling HTTPS traffic, we'll stick to just HTTPS, so won't have to deal with all that additional complexity

I don't feel there is a clear and significant request from the community to implement this, so I would just not put this in the roadmap for now.

This is more about slowly moving webservices things into jobs-api one at a time (in contrast to moving everything all at once), than about community requests.

One thing that I'd like to make possible is routing specific subpaths to different deployments. One possibility for that would be to:

This approach is the alternative to having something like jobname.toolname.toolforge.org. The bigger question is, do we want to support more than 1 domain name for a tool (a clear digression from what we currently have in webservice).
If so what will the limits be?
If not why?

In the future jobname.toolname.toolforge.org.

I am having some trouble coming up with a good use case for this. I have been an advocate for narrow tools rather than expansive suites in the Toolforge space. I advocate for this largely because keeping a tool with a single focus working is assumed to be easier than keeping a large suite healthy over the longer term. When a developer loses interest in a narrow tool it should be easier to find people both willing and capable of taking it over. When a feature request comes in for a narrow tool it should be easier to decide if it is in scope or out of scope for the tool. The idea of N tools sharing the same tool account namespace but presenting as separate hosts seems to run counter to that narrow tool advocacy.

Thanks Bryan

Can you help me think about this by providing a concrete use case?

One use-case I see is when you want to have a webservice + a backend API, and serve the backend api from /api, this would avoid having to do it in the webservice code and move it to the ingress.

Yeah, that's the main thing in my mind.

This approach is the alternative to having something like jobname.toolname.toolforge.org. The bigger question is, do we want to support more than 1 domain name for a tool (a clear digression from what we currently have in webservice).
If not why?

In addition to what Bryan already said in T388092#10769733, this would also include major additional operational complexity for provisioning public TLS certificates. In general no CA will issue double-wildcard certificates (e.g. *.*.toolforge.org) so supporting such use case would essentially require us to essentially provision separate a certificate for each tool using this feature which is does not seem reasonable for this initial migration.

In general no CA will issue double-wildcard certificates (e.g. *.*.toolforge.org) so supporting such use case would essentially require us to essentially provision separate a certificate for each tool using this feature which is does not seem reasonable for this initial migration.

Yep, that's a big change yes, could be changed to <job>-toolname.toolforge.org or something like it.

At that point you might as well have multiple tools..

At that point you might as well have multiple tools..

yep