Page MenuHomePhabricator

Replace Varnish backends with ATS on cache upload nodes in ulsfo
Closed, ResolvedPublic

Description

After the successful production switchover to ATS performed in T213263, we can now begin replacing Varnish backends with ATS on cache nodes, beginning with the upload cluster in ulsfo.

As a first step, we are going to produce the necessary puppetization to deploy a mixed varnish-fe/ats-be cache node. Then we are going to use such puppetization to reimage existing nodes. Note that ATS backends will be routed straight to the applayer, rather than through other DCs.

Details

Related Gerrit Patches:
operations/puppet : productionvarnish: retry requests upon 502 errors
operations/puppet : productioncache: reimage cp4026 as upload_ats
operations/puppet : productionprometheus: add glob for ATS to file_sd_configs
operations/puppet : productionprometheus: add upload_ats mtail targets
operations/puppet : productioncache: reimage cp4025 as upload_ats
operations/puppet : productioncache: reimage cp4024 as upload_ats
operations/puppet : productioncache: reimage cp4023 as upload_ats
operations/puppet : productioncache: add hiera setting for varnish backend restarts
operations/puppet : productioncache: reimage cp4022 as upload_ats
operations/puppet : productioncache: add ATS nodes to cacheproxy::cron_restart
operations/puppet : productionAdd profile::cache::varnish::frontend::text
operations/puppet : productioncumin aliases: upload_ats is upload
operations/puppet : productioncache: hiera setting to list backend services
operations/puppet : productioncache: do not set backend_service
operations/puppet : productionRevert "conftool-data: define ats-be for text/upload in all DCs"
operations/puppet : productionconftool-data: define ats-be for text/upload in all DCs
operations/puppet : productioncache: multiple keyspaces support for directors.frontend.vcl
operations/puppet : productionconftool-data: set cp4021 as the only ats-be in production
operations/puppet : productioncache: move varnish etcd-based directors to profile
operations/puppet : productionvarnish: add reload_vcl_opts function
operations/puppet : productioncache: distinguish between Varnish and ATS nodes
operations/puppet : productioncumin: add ATS production hosts to aliases
operations/puppet : productioncache: unify cache nodes definition in hieradata
operations/puppet : productionprometheus: use ATS profile instead of role in job definition
operations/puppet : productioncache: move check_varnish_expiry_mailbox_lag to backend profile
operations/puppet : productiontrafficserver: avoid apt dependency cycle
operations/puppet : productiontrafficserver: run apt-get update before installing
operations/puppet : productioncache: add ATS hiera settings to role upload_ats
operations/puppet : productioncache: reimage cp4021 as upload_ats
operations/puppet : productionrole::cache::upload_ats: Varnish frontend / ATS backend setup
operations/puppet : productionprofile::trafficserver::backend: do not configure vhtcpd
operations/puppet : productioncache: stop passing route_table to varnish-fe
operations/puppet : productionconftool-data: add ats-be to cache_upload@ulsfo
operations/puppet : productioncache: remove unused purge-related hiera settings
operations/puppet : productioncache: remove cacheproxy::instance_pair
operations/puppet : productioncache: remove profile::cache::{text,upload}
operations/puppet : productioncache: implement profile::cache::varnish::backend
operations/puppet : productioncache: add profile::cache::varnish::frontend
operations/puppet : productioncache: explicitly pass cache_route to varnish::wikimedia_vcl
operations/puppet : productioncache: move varnish storage config to varnish-be profile
operations/puppet : productioncache: add profile::cache::varnish::backend
operations/puppet : productionATS: make config files depend on package
operations/puppet : productionATS: make error template directory depend on package
operations/puppet : productionATS: install libhwloc5 from stretch-backports

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 504895 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: remove unused hiera settings

https://gerrit.wikimedia.org/r/504895

Change 504895 merged by Ema:
[operations/puppet@production] cache: remove unused purge-related hiera settings

https://gerrit.wikimedia.org/r/504895

Change 505708 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] conftool-data: add ats-be to cache_upload@ulsfo

https://gerrit.wikimedia.org/r/505708

Change 505708 merged by Ema:
[operations/puppet@production] conftool-data: add ats-be to cache_upload@ulsfo

https://gerrit.wikimedia.org/r/505708

Change 505724 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: stop passing route_table to varnish-fe

https://gerrit.wikimedia.org/r/505724

Change 505724 merged by Ema:
[operations/puppet@production] cache: stop passing route_table to varnish-fe

https://gerrit.wikimedia.org/r/505724

Change 505748 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] profile::trafficserver::backend: do not configure vhtcpd

https://gerrit.wikimedia.org/r/505748

Change 505748 merged by Ema:
[operations/puppet@production] profile::trafficserver::backend: do not configure vhtcpd

https://gerrit.wikimedia.org/r/505748

Change 505767 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4021 as upload_ats

https://gerrit.wikimedia.org/r/505767

Change 501360 merged by Ema:
[operations/puppet@production] role::cache::upload_ats: Varnish frontend / ATS backend setup

https://gerrit.wikimedia.org/r/501360

Mentioned in SAL (#wikimedia-operations) [2019-04-23T13:54:48Z] <ema> depool cp4021 and reimage as upload_ats T219967

Change 505767 merged by Ema:
[operations/puppet@production] cache: reimage cp4021 as upload_ats

https://gerrit.wikimedia.org/r/505767

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp4021.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904231359_ema_129175.log.

Completed auto-reimage of hosts:

['cp4021.ulsfo.wmnet']

and were ALL successful.

Change 505789 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: add ATS hiera settings to role upload_ats

https://gerrit.wikimedia.org/r/505789

Change 505789 merged by Ema:
[operations/puppet@production] cache: add ATS hiera settings to role upload_ats

https://gerrit.wikimedia.org/r/505789

Change 505799 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] trafficserver: run apt-get update before installing

https://gerrit.wikimedia.org/r/505799

Change 505799 merged by Ema:
[operations/puppet@production] trafficserver: run apt-get update before installing

https://gerrit.wikimedia.org/r/505799

Change 505805 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] trafficserver: avoid apt dependency cycle

https://gerrit.wikimedia.org/r/505805

Change 505805 merged by Ema:
[operations/puppet@production] trafficserver: avoid apt dependency cycle

https://gerrit.wikimedia.org/r/505805

Change 505815 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: distinguish between upload and upload_ats nodes

https://gerrit.wikimedia.org/r/505815

Change 506090 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: move check_varnish_expiry_mailbox_lag to backend profile

https://gerrit.wikimedia.org/r/506090

Change 506090 merged by Ema:
[operations/puppet@production] cache: move check_varnish_expiry_mailbox_lag to backend profile

https://gerrit.wikimedia.org/r/506090

Change 506122 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: use ATS profile instead of role in job definition

https://gerrit.wikimedia.org/r/506122

Change 506122 merged by Ema:
[operations/puppet@production] prometheus: use ATS profile instead of role in job definition

https://gerrit.wikimedia.org/r/506122

Change 506154 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: unify cache nodes definition in hieradata

https://gerrit.wikimedia.org/r/506154

Change 506154 merged by Ema:
[operations/puppet@production] cache: unify cache nodes definition in hieradata

https://gerrit.wikimedia.org/r/506154

Change 506177 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cumin: add ATS production hosts to aliases

https://gerrit.wikimedia.org/r/506177

Change 506177 merged by Ema:
[operations/puppet@production] cumin: add ATS production hosts to aliases

https://gerrit.wikimedia.org/r/506177

Change 505815 merged by Ema:
[operations/puppet@production] cache: distinguish between Varnish and ATS nodes

https://gerrit.wikimedia.org/r/505815

Change 506389 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: move varnish etcd-based directors to profile

https://gerrit.wikimedia.org/r/506389

Change 506409 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: add reload_vcl_opts function

https://gerrit.wikimedia.org/r/506409

Change 506409 merged by Ema:
[operations/puppet@production] varnish: add reload_vcl_opts function

https://gerrit.wikimedia.org/r/506409

Change 506389 merged by Ema:
[operations/puppet@production] cache: move varnish etcd-based directors to profile

https://gerrit.wikimedia.org/r/506389

Change 506445 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] conftool-data: cp4021 only ats-be in production

https://gerrit.wikimedia.org/r/506445

Change 506480 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: multiple keyspaces support for directors.frontend.vcl

https://gerrit.wikimedia.org/r/506480

Change 506484 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: do not set backend_service

https://gerrit.wikimedia.org/r/506484

Change 506445 merged by Ema:
[operations/puppet@production] conftool-data: set cp4021 as the only ats-be in production

https://gerrit.wikimedia.org/r/506445

Change 506480 merged by Ema:
[operations/puppet@production] cache: multiple keyspaces support for directors.frontend.vcl

https://gerrit.wikimedia.org/r/506480

Change 506624 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] conftool-data: define ats-be for text/upload in all DCs

https://gerrit.wikimedia.org/r/506624

Change 506624 merged by Ema:
[operations/puppet@production] conftool-data: define ats-be for text/upload in all DCs

https://gerrit.wikimedia.org/r/506624

Mentioned in SAL (#wikimedia-operations) [2019-04-26T10:19:09Z] <ema> depool cp3030 for testing T219967

Change 506633 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "conftool-data: define ats-be for text/upload in all DCs"

https://gerrit.wikimedia.org/r/506633

Change 506633 merged by Ema:
[operations/puppet@production] Revert "conftool-data: define ats-be for text/upload in all DCs"

https://gerrit.wikimedia.org/r/506633

Change 506636 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: hiera setting to list backend services

https://gerrit.wikimedia.org/r/506636

Change 506484 abandoned by Ema:
cache: do not set backend_service

Reason:
Obsoleted by https://gerrit.wikimedia.org/r/#/c/operations/puppet/ /506636/

https://gerrit.wikimedia.org/r/506484

Change 506636 merged by Ema:
[operations/puppet@production] cache: hiera setting to list backend services

https://gerrit.wikimedia.org/r/506636

Mentioned in SAL (#wikimedia-operations) [2019-04-26T12:20:36Z] <ema> repool cp3030 after directors.frontend.vcl testing T219967

Mentioned in SAL (#wikimedia-operations) [2019-04-26T12:44:43Z] <ema> pool cp4021 w/ ATS backend T219967

Change 506980 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cumin aliases: upload_ats is upload

https://gerrit.wikimedia.org/r/506980

Change 506980 merged by Ema:
[operations/puppet@production] cumin aliases: upload_ats is upload

https://gerrit.wikimedia.org/r/506980

Change 507022 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add profile::cache::varnish::frontend::text

https://gerrit.wikimedia.org/r/507022

Change 507022 merged by Ema:
[operations/puppet@production] Add profile::cache::varnish::frontend::text

https://gerrit.wikimedia.org/r/507022

Change 507266 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4022 as upload_ats

https://gerrit.wikimedia.org/r/507266

Change 507267 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: add ulsfo_ats to cacheproxy::cron_restart

https://gerrit.wikimedia.org/r/507267

Change 507267 merged by Ema:
[operations/puppet@production] cache: add ATS nodes to cacheproxy::cron_restart

https://gerrit.wikimedia.org/r/507267

Mentioned in SAL (#wikimedia-operations) [2019-04-30T13:28:19Z] <ema> depool cp4022 and reimage as upload_ats T219967

Change 507266 merged by Ema:
[operations/puppet@production] cache: reimage cp4022 as upload_ats

https://gerrit.wikimedia.org/r/507266

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp4022.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904301334_ema_110727.log.

Completed auto-reimage of hosts:

['cp4022.ulsfo.wmnet']

Of which those FAILED:

['cp4022.ulsfo.wmnet']

Mentioned in SAL (#wikimedia-operations) [2019-04-30T16:04:50Z] <ema> pool cp4022 w/ ATS backend T219967

Change 507358 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: add hiera setting for varnish backend restarts

https://gerrit.wikimedia.org/r/507358

Change 507358 merged by Ema:
[operations/puppet@production] cache: add hiera setting for varnish backend restarts

https://gerrit.wikimedia.org/r/507358

Change 507744 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4023 as upload_ats

https://gerrit.wikimedia.org/r/507744

Mentioned in SAL (#wikimedia-operations) [2019-05-02T09:02:48Z] <ema> depool cp4023 and reimage as upload_ats T219967

Change 507744 merged by Ema:
[operations/puppet@production] cache: reimage cp4023 as upload_ats

https://gerrit.wikimedia.org/r/507744

Script wmf-auto-reimage was launched by ema on cumin2001.codfw.wmnet for hosts:

['cp4023.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905020907_ema_2088.log.

Completed auto-reimage of hosts:

['cp4023.ulsfo.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-05-02T10:03:58Z] <ema> pool cp4023 w/ ATS backend T219967

Change 507915 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4024 as upload_ats

https://gerrit.wikimedia.org/r/507915

Mentioned in SAL (#wikimedia-operations) [2019-05-03T07:45:52Z] <ema> depool cp4024 and reimage as upload_ats T219967

Change 507915 merged by Ema:
[operations/puppet@production] cache: reimage cp4024 as upload_ats

https://gerrit.wikimedia.org/r/507915

Script wmf-auto-reimage was launched by ema on cumin2001.codfw.wmnet for hosts:

['cp4024.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905030750_ema_2659.log.

Completed auto-reimage of hosts:

['cp4024.ulsfo.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-05-03T08:47:40Z] <ema> pool cp4024 w/ ATS backend T219967

Change 507935 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4025 as upload_ats

https://gerrit.wikimedia.org/r/507935

Mentioned in SAL (#wikimedia-operations) [2019-05-03T09:49:27Z] <ema> depool cp4025 and reimage as upload_ats T219967

Change 507935 merged by Ema:
[operations/puppet@production] cache: reimage cp4025 as upload_ats

https://gerrit.wikimedia.org/r/507935

Script wmf-auto-reimage was launched by ema on cumin2001.codfw.wmnet for hosts:

['cp4025.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905030954_ema_30037.log.

Completed auto-reimage of hosts:

['cp4025.ulsfo.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-05-03T10:47:49Z] <ema> pool cp4025 w/ ATS backend T219967

Change 507953 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: retry requests upon 502 errors

https://gerrit.wikimedia.org/r/507953

Change 508284 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: add upload_ats target

https://gerrit.wikimedia.org/r/508284

Change 508284 merged by Ema:
[operations/puppet@production] prometheus: add upload_ats mtail targets

https://gerrit.wikimedia.org/r/508284

Change 508296 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: add glob for ATS to file_sd_configs

https://gerrit.wikimedia.org/r/508296

Change 508296 merged by Ema:
[operations/puppet@production] prometheus: add glob for ATS to file_sd_configs

https://gerrit.wikimedia.org/r/508296

Change 508304 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp4026 as upload_ats

https://gerrit.wikimedia.org/r/508304

Mentioned in SAL (#wikimedia-operations) [2019-05-06T14:19:52Z] <ema> depool cp4026 and reimage as upload_ats T219967

Change 508304 merged by Ema:
[operations/puppet@production] cache: reimage cp4026 as upload_ats

https://gerrit.wikimedia.org/r/508304

Script wmf-auto-reimage was launched by ema on cumin2001.codfw.wmnet for hosts:

['cp4026.ulsfo.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905061425_ema_2856.log.

Completed auto-reimage of hosts:

['cp4026.ulsfo.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-05-06T15:14:31Z] <ema> pool cp4026 w/ ATS backend T219967

ema closed this task as Resolved.May 7 2019, 9:49 AM
ema claimed this task.

All Varnish backends in ulsfo upload replaced with ATS.

Change 507953 abandoned by Ema:
varnish: retry requests upon 502 errors

Reason:
This isn't necessary anymore:
https://gerrit.wikimedia.org/r/#/c/operations/puppet/ /525222/

https://gerrit.wikimedia.org/r/507953