Problem
DNS names for Toolforge infrastructure services are in inconsistent DNS zones (examples: mail.tools.wmflabs.org., k8s.tools.eqiad1.wikimedia.cloud., jobs.svc.tools.eqiad1.wikimedia.cloud., login.toolforge.org., tools-redis., tools-prometheus.wmflabs.org.) and conflict with names of actual tools hosted on tool-name.toolforge.org (examples: login.toolforge.org.).
We've mostly settled on using using svc.$PROJECT.$DEPLOYMENT.wikimedia.cloud for services that are not exposed to the public internet, but the question of public names remains. New public services are rarer, and the only recent example (Harbor) went with a cloud vps web proxy. However, that does not work for some use cases.
So in short, this decision request aims to decide which name to use for services that:
- May be user-facing (seen by tool maintainers, and possibly end users)
- Might not be able use the shared Cloud VPS web proxy
- For example due to custom TLS termination needs (example: API authentication)
- Or due to specific load balancing / traffic needs (this is a risk for Harbor and static/object storage services)
- Or maybe they're not using HTTP at all! (example: SSH access)
- May be hosted in Kubernetes or on VMs, but are not running via tool accounts
- For tool accounts *.toolforge.org will remain the only supported scheme for now.
Apart from tools-*.wmcloud.org, none of these really have any technical advantages or disadvantages: the services we're talking about here can't easily re-use any of our existing infrastructure and they're all controlled in Designate in the tools project so we can issue acme-chief certs for them.
An example task that is affected by this decision would be T332476: Toolforge: expose API gateway to the internet.
Also, just to make it clear: I'm not proposing an immediate re-naming of all existing services. This is to make it clear what to use for new services, and existing services can stay as is. (Although if they're getting major updates that would require some changes to the network ingress setup or naming anyways, the people doing that should consider using the latest scheme. An example of this was the latest Redis migration which moved it under the internal svc.tools.eqiad1.wikimedia.cloud domain)
For additional context on the different domains we have, see: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS
Constraints and risks
- Inconsistency in infrastructure
- Confusion between infrastructure and user-created tools, also maybe a phishing risk
- Use of deprecated terms like 'labs'
Decision record
coming soon
Options
Note that toolsbeta already exists. We already use *.beta.toolforge.org. for toolsbeta webservices, so if we end up choosing something under toolforge.org the author assumes we will replicate the same under beta.tf.o for toolsbeta use.
A. Use something like *.internal.toolforge.org
This is the preferred solution by the author. It makes it clear that it's about Toolforge (since it's on toolforge.org) and it's clear that it's an admin-managed service and not user-created tool (since it does not share the same namespace).
Alternative but very similar options:
- *.admin.toolforge.org
- *.infra.toolforge.org
- *.mgmt.toolforge.org
- *.svc.toolforge.org
B. Use *.tools.wmcloud.org
This also follows the standard Cloud VPS model, but it means that possibly user-facing addresses will have inconsistency ('tools' vs 'toolforge', different domain).
C. Use *.toolforge.org
This is a simple option but means that the same naming scheme is used for both infrastructure and tools.
D. Use tools-*.wmcloud.org
This is an option, but mostly for services that will use the shared Cloud VPS web proxy. For other use cases *.tools.wmcloud.org is vastly superior, as that domain is already delegated to the tools project.