public tracking version of T365259
- Assign the new public IPs: a v4 and a v6 in each of the DC-specific public service address ranges (example)
- create DNS records gerrit-lb.$DC.wikimedia.org (should be done by Netbox semi-automatically)
- Prepare for geodns with the new public IP
- Add a gerrit-addrs resource to operations/dns // geo-resources https://gerrit.wikimedia.org/r/c/operations/dns/+/1214177
- Update dns.admin cookbook to reflect gerrit-addrs https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214179
- Update geodns schema in conftool-data/geodns/services.yaml to add gerrit-addrs for admin_state https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214192
- Add a gerrit-addrs resource to operations/dns // geo-resources https://gerrit.wikimedia.org/r/c/operations/dns/+/1214177
- Prepare tcpproxy VMs for accepting traffic on the new public IPs
- Create a new conftool service for tcp-proxy, add to it the realserver VMs in each DC https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214454
- Create two new service catalog entries both sharing those same public IPs, and using LVS class high-traffic1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1202842
- add LVS profiles to tcpproxy puppet role https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215240
- Add gerrit to ATS cache_text as a backend (in the future, this will become an internal-only endpoint) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215317
- Temporarily skip server cert name verification for Gerrit upstream from ATS https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215684 OR use Letsencrypt root cert https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215684
- Ensure Varnish VCL includes gerrit.wm.o in any relevant instances of its many hostname regex patterns ...?
- Prepare cache_text servers for accepting traffic on the new public IP (mark them as profile::lvs::realservers for the new service catalog entry gerrit-https) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1226932
Acceptance criteria before continuing:
- on a cache_text host, curl -v https://gerrit.wikimedia.org --connect-to ::localhost
- this MUST show a HTTP 302 to Location: https://gerrit.wikimedia.org/r/
- must NOT serve a 5xx error, or show the default Mediawiki page served (HTTP 200 with resp hdr < server: mw-web.xxxx...)
- on a cache_text host, curl -s https://gerrit.wikimedia.org/r/ --connect-to ::localhost | grep 'Gerrit Code Review'
- this MUST complete successfully, with a match on the <meta name="description" ...> tag
- on a cache_text host, ip a show lo output includes the public IPs for gerrit-lb.$DC.wikimedia.org
- Reconfigure Liberica/Katran hightraffic1 to accept traffic on the new public IP, and to route it to the appropriate-for-the-dstport realservers
- First, try in only one CDN site: magru or drmrs perhaps? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215388
- Add gerrit-ssh and gerrit-https to profile::liberica::include_services on the secondary Liberica host in the CDN site (e.g. lvs7003) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215388
- In the catalog, set both services to state: lvs_setup in that one DC https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215389
- On the secondary Liberica host, verify happy healthchecking for both ssh and https services
- Repeat on the primary hightraffic1 host https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215398
- Verify functionality, soak-test for a few hours at least; then continue rollout globally.
- Keeping the services in lvs_setup state, repeat the above procedure, but instead adding new sites to the list in the catalog, like https://gerrit.wikimedia.org/r/1215693
- First, try in only one CDN site: magru or drmrs perhaps? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215388
At this point the new IP (& new data path) are accessible externally. Thus we should proceed to:
- Opt-in SRE & developer testing
- Write instructions and/or ship tunnelencabulator feature: modify /etc/hosts to point gerrit.wikimedia.org to the new, CDN-fronted public IP https://gerrit.wikimedia.org/r/c/operations/debs/wmf-laptop/+/1227395
- One full business day of testing with several volunteers?
- write a mail to wider-sre list? | not needed, but we are mailing ops and wikitech-l about the switch ---
[ ] per @taavi, determine whether or not we want to also include the new Gerrit IPs on the Cloud VPS egress NAT exemption list, as the old ones are on there. (We probably do want to?)- Migrate the public gerrit.wikimedia.org DNS record: gerrit 180 IN DYNA geoip!gerrit-addrs https://gerrit.wikimedia.org/r/1215709
- test and document emergency access when the CDN is down
- tunneling using tunnelencabulator documented here