Page MenuHomePhabricator

Separate Gerrit https and ssh/git hostnames
Closed, DeclinedPublic

Description

For the upcoming migration of Gerrit behind the CDN (T365259), the suggested approach is to first separate the hostnames for HTTPS traffic and SSH/Git traffic (see T365259#10820392). The proposal is to use gerrit.wikimedia.org for HTTPS and a new hostname like gerrit-git.wikimedia.org (tbd) for SSH/Git.

Implementing this change would require a lot of refactoring across various tools and automation in CI, Puppet, and local repository git configurations. It might be possible to drop the 29418 port and use the default port 22 for gerrit-git.wikimedia.org, which would keep the user experience at least consistently awkward. However this would make the grace period more complicated because the service has to listen on both ports.

A grace period could be established where both endpoints remain functional to allow users sufficient time to switch to the new hostname. After this period, SSH on the old hostname support could be removed. Testing could begin with the replica to test any unexpected behavior before proceeding to the production host.

The rough outline could look like that:

  • Discuss with RelEng if a change of hostname is reasonable (for volunteers, staff and tooling)
  • Add new hostnames
    • gerrit-git.wikimedia.org gerrit-ssh.wikimedia.org (tbd)
    • gerrit-replica-git.wikimedia.org gerrit-replica-ssh.wikimedia.org (tbd)
  • Enable SSH/GIT on replica
  • Enable SSH/GIT on production host
  • Inform users about upcoming change and announce date for action required
    • write tutorial for users how to change git config
    • update all tooling to use new hostname
  • Remove support for SSH/GIT on old hostname on replica
  • Remove support for SSH/GIT on old hostname on production host
  • Repeat for GitLab :)

Details

Other Assignee
LSobanski
Related Changes in Gerrit:

Event Timeline

a new hostname like gerrit-git.wikimedia.org (tbd) for SSH/Git.

Naming is always tricky, but I wonder if putting ssh in the hostname would be helpful for everyone? gerrit-ssh.wikimedia.org or git-ssh.wikimedia.org or something similar. My thinking here is that it would help to remind everyone why there are two different names for largely the same service.

Implementing this change would require a lot of refactoring across various tools and automation in CI, Puppet, and local repository git configurations.

Split horizon DNS might be able to mitigate some of these issues if we could continue to present gerrit's ssh interface inside the production and Cloud VPS networks as ssh://gerrit.wikimedia.org:29418/. This sort of fix may just extend the period of confusion for using the service however. Documentation of how to use ~/.gitconfig insteadOf rules to rewrite connections at runtime might be a more helpful meta solution.

a new hostname like gerrit-git.wikimedia.org (tbd) for SSH/Git.

Naming is always tricky, but I wonder if putting ssh in the hostname would be helpful for everyone? gerrit-ssh.wikimedia.org or git-ssh.wikimedia.org or something similar. My thinking here is that it would help to remind everyone why there are two different names for largely the same service.

I like gerrit-ssh.wikimedia.org as it indicates that it should be used with SSH access. git-ssh.wikimedia.org feels a bit too generic, especially since we'll need a similar structure for GitLab. I've updated the names in the task description.

FYI Phabricator Diffusion had used git-ssh.wikimedia.org until T296022 which indeed felt too generic. gerrit-ssh IMO makes sense.

Naming is always tricky, but I wonder if putting ssh in the hostname would be helpful for everyone? gerrit-ssh.wikimedia.org or git-ssh.wikimedia.org or something similar.

I would like to add here that git-ssh.wikimedia.org has already existed in the past. When it pointed to Phabricator and we still used that for repos.

Whether this is a reason to use it again or to NOT use it again.. not so sure.

edit: I wrote this without seeing the latest comments from a few days ago. Seems like everyone agrees that we do NOT like "git-ssh". So gerrit-ssh it is.

Change #1148438 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] wikimedia.org: add gerrit-ssh, gerrit-replica-ssh records

https://gerrit.wikimedia.org/r/1148438

I'd really love to keep the same url to access gerrit regardless of protocol. That's the expected workflow for all code forges.

I'm sure separate urls is no one's first choice. Are there no other options? I read T365259#10820392, but my mental model of liberica is poor/non-existent. Is there any possibility of packet filtering in front of Liberica that would avoid ongoing budensome config changes?

Splitting traffic will mean acute, short-term pain (there are 2500 files in codesearch that mention gerrit.wikimedia.org and >500 page hits on mediawiki.org that mention gerrit.wikimedia.org—stuff is going to break). And I grok there's ongoing maintenance burden if we add gerrit's ssh port to Liberica. But there will be ongoing maintenance of a different sort from splitting: after every developer updates their local repos, every newcomer will be confused thereafter.

I know I'll be answering questions about this, so I need help to understand the options.

I'd really love to keep the same url to access gerrit regardless of protocol. That's the expected workflow for all code forges.

I'm sure separate urls is no one's first choice. Are there no other options? I read T365259#10820392, but my mental model of liberica is poor/non-existent. Is there any possibility of packet filtering in front of Liberica that would avoid ongoing budensome config changes?

Splitting traffic will mean acute, short-term pain (there are 2500 files in codesearch that mention gerrit.wikimedia.org and >500 page hits on mediawiki.org that mention gerrit.wikimedia.org—stuff is going to break). And I grok there's ongoing maintenance burden if we add gerrit's ssh port to Liberica. But there will be ongoing maintenance of a different sort from splitting: after every developer updates their local repos, every newcomer will be confused thereafter.

I know I'll be answering questions about this, so I need help to understand the options.

You're absolutely right that using a single hostname for both HTTPS and SSH is the expected and most user-friendly approach. That said, doing so within our current architecture introduces some significant challenges:

  • Security Exposure: If we keep gerrit.wikimedia.org for both HTTPS and SSH, then:
    • Port 29418 (for Gerrit's SSH) would be open on the same VIP as the main Wikimedia sites (including Wikipedia).
    • For GitLab, which uses the standard port 22, this becomes even riskier—exposing that port on our main frontend would invite large-scale automated attacks, probing, and abuse, particularly from bots scanning the internet.
  • Routing Complexity:
    • Gerrit and GitLab are only deployed in the core DCs (eqiad and codfw).
    • With Liberica we could technically route traffic from edge PoPs to backend services in the core DCs:
      • However, this approach (handling SSH through our CDN PoPs) is not currently a supported configuration. We'd be treading new ground with potential edge cases, reliability concerns, and debugging complexity.
      • Alternatively, deploying and operating SSH-aware L7 proxies in each PoP just to split SSH traffic is a non-trivial operational burden, and it would increase our footprint in places where we deliberately keep things minimal.

I'm sure I'm missing something here.. but my understanding is that both LVS/PyBal and Liberica support routing different ports on the same VIP to different backends.. so is there a reason the following doesn't seem to be considered here at all?

  • Allocate a new Gerrit-specific service VIP in the eqiad/codfw public load balancer ranges (note this will replace the current Gerrit VIPs from the per-row ranges, instead of being a fully new public IP allocation)
  • Add that service VIP to the cp nodes in core datacenters, bind the haproxy tls terminator on it, and route 443/tcp there on the load balancer config
  • Add the service VIP to the Gerrit backend node, and bind the SSH service to it and route it there

I'm sure I'm missing something here.. but my understanding is that both LVS/PyBal and Liberica support routing different ports on the same VIP to different backends.. so is there a reason the following doesn't seem to be considered here at all?

  • Allocate a new Gerrit-specific service VIP in the eqiad/codfw public load balancer ranges (note this will replace the current Gerrit VIPs from the per-row ranges, instead of being a fully new public IP allocation)
  • Add that service VIP to the cp nodes in core datacenters, bind the haproxy tls terminator on it, and route 443/tcp there on the load balancer config
  • Add the service VIP to the Gerrit backend node, and bind the SSH service to it and route it there

Technically, that’s feasible but it introduces a one-off edge case in our CDN architecture that we currently don’t support for several reasons:

  • Shared service IP per DC: All services behind the CDN share a single public IP per DC (text-lb.$site.wikimedia.org). Introducing a service-specific VIP breaks that model, making edge operations more complex.
  • Gerrit’s active/passive model: Gerrit is only active in one DC at a time (eqiad or codfw). If gerrit is reachable in both DCs, we’d need special-case load balancing or application-level awareness to avoid sending users to a passive backend.
  • Operational burden during DC maintenance or failover: Our current model allows seamless DC depooling for the CDN. Introducing a one-off VIP tied to a single DC (to avoid breaking Gerrit’s active/passive model) would mean coordinating failovers and DNS/IP changes every time we depool a DC, something we don’t do for any other service behind the CDN.

Splitting traffic will mean acute, short-term pain (there are 2500 files in codesearch that mention gerrit.wikimedia.org and >500 page hits on mediawiki.org that mention gerrit.wikimedia.org—stuff is going to break).

The HTTPS use cases will remain unchanged and a search for "gerrit.wikimedia.org 29418" finds 103 source files (when excluding .gitreview) and 90 hits on mediawiki.org. Still sizable numbers but significantly lower.

As discussed out of band, let's get all the parties together and discuss this live.

Jelto changed the task status from Open to Stalled.Jun 18 2025, 3:41 PM
Jelto added a subscriber: hashar.

Discuss with RelEng if a change of hostname is reasonable (for volunteers, staff and tooling)

We had a discussion with RelEng (@thcipriani and @hashar), together with @Vgutierrez, about separating the hostnames for SSH/git and HTTPS. We explored different options like spinning up additional SSH proxies in each PoP to reduce routing issues, using new hostnames for SSH, moving Gitiles to a different machine, or disabling git over SSH completely.

The conclusion from RelEng was that changing the hostname for SSH is hardly possible. Just changing the non-default port (29418) wasn't done either due to similar issues, see also T165631: move gerrit.wm.org SSH service to private/behind LVS like phab-vcs.

The remaining options are either moving Gitiles out of Gerrit (T392467#10773655) or putting Gerrit behind the CDN with additional SSH proxies in each DC. It would also be possible to disable git-ssh completely and rely on git over http + token.

We agreed to start with the intermediate solution of moving Gitiles out of the production Gerrit host and setting up something like gitiles.wikimedia.org.

I don't want to rule out the other options. In my opinion, the Gitiles separation is not a long-term solution and it's only a matter of time until we’ll need to separate another endpoint. But it's helping keep Gerrit up and running. Also rate limiting and caching could be configured for Gitiles then.

So I'll stall this task and create a new one for the Gitiles separation.

Change #1148438 abandoned by Dzahn:

[operations/dns@master] wikimedia.org: add gerrit-ssh, gerrit-replica-ssh records

Reason:

latest ticket comment says pretty much we aren't going to separate the host names but instead just move out gitiles first, right?

https://gerrit.wikimedia.org/r/1148438

Jelto removed Jelto as the assignee of this task.Aug 7 2025, 6:23 AM

I'll un-assign this task while I'm out but this task is still high-priority to ensure Gerrits availability in the future. See more details in T365259#11067179.

So.. meanwhile Gerrit is available behind the CDN (T365259) which was implemented in T411895 and is using the new tcp-proxy for ssh.

This made it possible to have both https and ssh behind CDN without splitting the host names.

Therefore I suggest this task can be closed (as declined?).

Dzahn changed the task status from Stalled to Open.Feb 3 2026, 11:41 PM

So.. meanwhile Gerrit is available behind the CDN (T365259) which was implemented in T411895 and is using the new tcp-proxy for ssh.

This made it possible to have both https and ssh behind CDN without splitting the host names.

Therefore I suggest this task can be closed (as declined?).

🎉 Thanks @Dzahn—Boldly declining.