Page MenuHomePhabricator

gerrit behind CDN
Closed, ResolvedPublic

Description

public tracking version of T365259


  • Assign the new public IPs: a v4 and a v6 in each of the DC-specific public service address ranges (example)
    • create DNS records gerrit-lb.$DC.wikimedia.org (should be done by Netbox semi-automatically)

Acceptance criteria before continuing:

  • on a cache_text host, curl -v https://gerrit.wikimedia.org --connect-to ::localhost
    • this MUST show a HTTP 302 to Location: https://gerrit.wikimedia.org/r/
    • must NOT serve a 5xx error, or show the default Mediawiki page served (HTTP 200 with resp hdr < server: mw-web.xxxx...)
  • on a cache_text host, curl -s https://gerrit.wikimedia.org/r/ --connect-to ::localhost | grep 'Gerrit Code Review'
    • this MUST complete successfully, with a match on the <meta name="description" ...> tag
  • on a cache_text host, ip a show lo output includes the public IPs for gerrit-lb.$DC.wikimedia.org

At this point the new IP (& new data path) are accessible externally. Thus we should proceed to:

  • Opt-in SRE & developer testing
    • Write instructions and/or ship tunnelencabulator feature: modify /etc/hosts to point gerrit.wikimedia.org to the new, CDN-fronted public IP https://gerrit.wikimedia.org/r/c/operations/debs/wmf-laptop/+/1227395
    • One full business day of testing with several volunteers?
      • write a mail to wider-sre list? | not needed, but we are mailing ops and wikitech-l about the switch ---
  • [ ] per @taavi, determine whether or not we want to also include the new Gerrit IPs on the Cloud VPS egress NAT exemption list, as the old ones are on there. (We probably do want to?)
  • Migrate the public gerrit.wikimedia.org DNS record: gerrit 180 IN DYNA geoip!gerrit-addrs https://gerrit.wikimedia.org/r/1215709
  • test and document emergency access when the CDN is down
    • tunneling using tunnelencabulator documented here

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+6 -0
operations/puppetproduction+0 -4
operations/puppetproduction+14 -24
operations/puppetproduction+5 -3
operations/dnsmaster+4 -1
operations/cookbooksmaster+23 -23
operations/puppetproduction+14 -0
operations/dnsmaster+1 -3
operations/homer/publicmaster+0 -5
operations/puppetproduction+14 -0
operations/debs/wmf-laptopmaster+47 -15
operations/puppetproduction+2 -2
operations/debs/wmf-laptopmaster+19 -5
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+12 -2
operations/puppetproduction+5 -0
operations/puppetproduction+5 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -0
operations/puppetproduction+5 -16
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
operations/puppetproduction+3 -0
operations/puppetproduction+2 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedDzahn
ResolvedNone
ResolvedCDanis
ResolvedABran-WMF
ResolvedJelto
ResolvedDzahn
ResolvedDzahn
ResolvedJelto
ResolvedVgutierrez
DuplicateNone
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
OpenNone
ResolvedABran-WMF
ResolvedABran-WMF

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1215389 merged by CDanis:

[operations/puppet@production] gerrit services: lvs_setup! but only in magru.

https://gerrit.wikimedia.org/r/1215389

Change #1215398 merged by CDanis:

[operations/puppet@production] lvs7001: add gerrit services

https://gerrit.wikimedia.org/r/1215398

🎉 the ssh port isn't yet working, but https is!

💙cdanis@wmftop ~ 🕟🍵 curl -s https://gerrit.wikimedia.org/r/ --connect-to ::gerrit-lb.magru.wikimedia.org | grep 'Gerrit Code Review'
<meta name="description" content="Gerrit Code Review">

💙cdanis@wmftop ~ 🕟🍵 curl https://gerrit.wikimedia.org --connect-to ::gerrit-lb.magru.wikimedia.org
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://gerrit.wikimedia.org/r/">here</a>.</p>
</body></html>

Change #1226951 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] tcp-proxy: allow lb healthchecks

https://gerrit.wikimedia.org/r/1226951

Change #1226951 merged by CDanis:

[operations/puppet@production] tcp-proxy: allow lb healthchecks

https://gerrit.wikimedia.org/r/1226951

Change #1227348 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] gerrit/Liberica: eqsin

https://gerrit.wikimedia.org/r/1227348

Change #1215693 merged by CDanis:

[operations/puppet@production] gerrit/Liberica: expand to drmrs

https://gerrit.wikimedia.org/r/1215693

💙cdanis@wmftop ~ 🕙☕ DC=magru ; curl -I -X GET https://gerrit.wikimedia.org --connect-to ::gerrit-lb.${DC}.wikimedia.org ; nc -vW1 gerrit-lb.${DC}.wikimedia.org 29418
HTTP/2 302
date: Thu, 15 Jan 2026 15:10:29 GMT
server: Apache
location: https://gerrit.wikimedia.org/r/
content-length: 215
content-type: text/html; charset=iso-8859-1
age: 1
vary: X-Forwarded-Proto
x-cache: cp7008 miss, cp7008 pass
x-cache-status: pass
server-timing: cache;desc="pass", host;desc="cp7008"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Uniq=lduuB793lSeIP6Ao8KGhOQLpAAAAAFvdmXX8iIiZ92DS7Rg0aVcUoUhHLVdOEKDe;Domain=gerrit.wikimedia.org;Path=/;HttpOnly;secure;SameSite=None;Expires=Fri, 15 Jan 2027 00:00:00 GMT
x-request-id: 2f56b390-8ac2-49e6-9794-6de1c5215d95

Connection to gerrit-lb.magru.wikimedia.org (195.200.68.225) 29418 port [tcp/*] succeeded!
SSH-2.0-GerritCodeReview_3.10.6 (APACHE-SSHD-2.12.0)

Change #1227356 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] Liberica/gerrit: 🌍‼️ 🎊

https://gerrit.wikimedia.org/r/1227356

Change #1227348 merged by CDanis:

[operations/puppet@production] gerrit/Liberica: eqsin

https://gerrit.wikimedia.org/r/1227348

Change #1227356 merged by CDanis:

[operations/puppet@production] Liberica/gerrit: 🌍‼️ 🎊

https://gerrit.wikimedia.org/r/1227356

Change #1227363 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] LVS/gerrit: eqiad

https://gerrit.wikimedia.org/r/1227363

Mentioned in SAL (#wikimedia-operations) [2026-01-15T17:02:10Z] <cdanis> 💙cdanis@cumin1003.eqiad.wmnet ~ 🕛☕ sudo cumin 'A:lvs-eqiad' 'disable-puppet T411895'

Change #1227363 merged by CDanis:

[operations/puppet@production] LVS/gerrit: eqiad

https://gerrit.wikimedia.org/r/1227363

Change #1227391 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] LVS/gerrit: codfw

https://gerrit.wikimedia.org/r/1227391

Mentioned in SAL (#wikimedia-operations) [2026-01-15T17:34:38Z] <cdanis> 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin 'A:lvs-codfw' 'disable-puppet T411895'

Change #1227391 merged by CDanis:

[operations/puppet@production] LVS/gerrit: codfw

https://gerrit.wikimedia.org/r/1227391

Mentioned in SAL (#wikimedia-operations) [2026-01-15T17:44:59Z] <cdanis> 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin 'A:lvs-codfw or A:lvs-eqiad' 'enable-puppet T411895'

Change #1227395 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/debs/wmf-laptop@master] tunnelencabulator: Gerrit/CDN 🚀

https://gerrit.wikimedia.org/r/1227395

Change #1227395 merged by CDanis:

[operations/debs/wmf-laptop@master] tunnelencabulator: Gerrit/CDN 🚀

https://gerrit.wikimedia.org/r/1227395

Change #1227423 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] services: gerrit* --> monitoring_setup

https://gerrit.wikimedia.org/r/1227423

Change #1227846 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/debs/wmf-laptop@master] tunnelencabulator: simple IPv6 support

https://gerrit.wikimedia.org/r/1227846

Change #1227846 merged by CDanis:

[operations/debs/wmf-laptop@master] tunnelencabulator: simple IPv6 support

https://gerrit.wikimedia.org/r/1227846

At this point I think we can say that T394271 Separate Gerrit https and ssh/git hostnames was not needed after all and can be closed as declined.

T394271#11581341

Change #1237887 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key

https://gerrit.wikimedia.org/r/1237887

Change #1237887 merged by Dzahn:

[operations/puppet@production] gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key

https://gerrit.wikimedia.org/r/1237887

Change #1238021 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] cloudgw: add gerrit-lb IPs to Cloud VPS egress NAT exemption list

https://gerrit.wikimedia.org/r/1238021

@taavi what is the rationale for having these on the NAT exemption list? I would have thought cloud vps instances could just access gerrit like anything else on the internet and go through the NAT?

Change #1238042 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Cloud-vrf-in: remove exception to allow Cloud VPS private IPs reach gerrit

https://gerrit.wikimedia.org/r/1238042

Change #1238021 abandoned by Dzahn:

[operations/puppet@production] cloudgw: add gerrit-lb IPs to Cloud VPS egress NAT exemption list

https://gerrit.wikimedia.org/r/1238021

Dzahn changed the task status from Open to In Progress.Feb 10 2026, 1:10 AM

per IRC discussion: The (existing, pre-CDN) Gerrit IPs have been removed from cloudgw "dmz_cidr" list because we think they are not needed and then we don't have to think about them again in the future.

The patch to add the new, post-CDN Gerrit IPs has been abandoned. But if for some reason we needed them after all that could be restored.

Thanks to @Andrew and wmcs-team for taking this on in an ad-hoc manner.

Change #1238042 merged by jenkins-bot:

[operations/homer/public@master] Cloud-vrf-in: remove exception to allow Cloud VPS private IPs reach gerrit

https://gerrit.wikimedia.org/r/1238042

Change #1238376 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: update switchover related cookbooks

https://gerrit.wikimedia.org/r/1238376

Mentioned in SAL (#wikimedia-operations) [2026-02-10T15:55:01Z] <topranks> remove ACL entry permitting Cloud VPS private IP addresses direct access to gerrit.wikimedia.org T411895

Change #1215709 merged by Dzahn:

[operations/dns@master] switch gerrit service IP to CDN

https://gerrit.wikimedia.org/r/1215709

Dzahn claimed this task.

This has happened. The DNS switch has been made.

example from US west:

host gerrit.wikimedia.org
gerrit.wikimedia.org has address 198.35.26.97

host 198.35.26.97
97.26.35.198.in-addr.arpa domain name pointer gerrit-lb.ulsfo.wikimedia.org.

Change #1238400 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gerrit: limit access to http/https/ssh in firewall

https://gerrit.wikimedia.org/r/1238400

Change #1238376 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: update switchover related cookbooks

https://gerrit.wikimedia.org/r/1238376

Change #1239878 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/dns@master] wikimedia: revert gerrit behind the CDN

https://gerrit.wikimedia.org/r/1239878

Change #1239878 abandoned by Jelto:

[operations/dns@master] wikimedia: revert gerrit behind the CDN

Reason:

not needed anymore, temporary workaround was found

https://gerrit.wikimedia.org/r/1239878

Change #1240747 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] gerrit tcp haproxy: rationalize timeouts

https://gerrit.wikimedia.org/r/1240747

Change #1240747 merged by Dzahn:

[operations/puppet@production] gerrit tcp haproxy: rationalize timeouts

https://gerrit.wikimedia.org/r/1240747

Jelto changed the status of subtask Restricted Task from Open to Stalled.Feb 27 2026, 10:22 AM
Jelto changed the status of subtask Restricted Task from Stalled to Open.Mar 4 2026, 1:27 PM

Change #1238400 merged by Jelto:

[operations/puppet@production] gerrit: limit access to http/https/ssh in firewall

https://gerrit.wikimedia.org/r/1238400

thanks @Jelto for the merge and puppet-agent runs, both spare and replica are still receiving replication traffic and are reachable by the CDN, but the public traffic has been dropped.

Change #1248484 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gerrit: remove CDN lookups from sshkey

https://gerrit.wikimedia.org/r/1248484

Change #1248484 merged by Jelto:

[operations/puppet@production] gerrit: remove CDN lookups from sshkey

https://gerrit.wikimedia.org/r/1248484

Jelto closed subtask Restricted Task as Resolved.Mar 10 2026, 10:33 AM

Change #1250601 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gerrit: fix failing discovery dns lookup in test spec

https://gerrit.wikimedia.org/r/1250601

Change #1250601 merged by Jelto:

[operations/puppet@production] gerrit: fix failing discovery dns lookup in test spec

https://gerrit.wikimedia.org/r/1250601

ABran-WMF closed subtask Restricted Task as Resolved.Mar 24 2026, 1:29 PM