Page MenuHomePhabricator

Replace Varnish backends with ATS on cache text nodes
Closed, ResolvedPublic

Description

This is the tracking task for the conversion of cache_text on-disk caches from Varnish to ATS. See T226589 for the similar, already completed, conversion of cache_upload.

DCs should be converted starting with the outer-most DCs, for example in the following order:

  • ulsfo
  • eqsin
  • esams
  • codfw
  • eqiad

Details

Related Gerrit Patches:
operations/puppet : productionATS: unset client req Accept-Encoding on ats-be
operations/puppet : productioncache: reimage cp1089 as text_ats
operations/puppet : productionRevert "Revert "cache: reimage cp2023 as text_ats""
operations/puppet : productionRevert "Revert "cache: reimage cp3064 as text_ats""
operations/puppet : productioncache: reimage cp1087 as text_ats
operations/puppet : productioncache: reimage cp1085 as text_ats
operations/puppet : productionATS: lookup cache for cookie requests
operations/puppet : productionATS: use set_server_resp_no_store, do not hide CC
operations/puppet : productionATS: test setup for default.lua
operations/puppet : productionATS: improve session/token match
operations/puppet : productionATS: mark uncacheable responses as 'pass' in X-Cache-Int
operations/puppet : productionATS: pass uncacheable requests
operations/puppet : productioncache: reimage cp1083 as text_ats
operations/puppet : productionRevert "ATS: explicitly skip the cache instead of hiding CC"
operations/puppet : productionRevert "ATS: do not coalesce uncacheable requests"
operations/puppet : productionATS: do not coalesce uncacheable requests
operations/puppet : productionATS: explicitly skip the cache instead of hiding CC
operations/puppet : productionRevert "cache: reimage cp3064 as text_ats"
operations/puppet : productionlate_command: remove cpNNNN mkfs stuff
operations/puppet : productioncache: reimage cp1081 as text_ats
operations/puppet : productioncache: reimage cp1079 as text_ats
operations/puppet : productionotrs/phabricator: do not assume text nodes are defined
operations/puppet : productioncache: reimage cp1077 as text_ats
operations/puppet : productionRevert "cache: reimage cp2023 as text_ats"
operations/puppet : productioncache: reimage cp2023 as text_ats
operations/puppet : productioncache: reimage cp2019 as text_ats
operations/puppet : productioncache: reimage cp2016 as text_ats
operations/puppet : productioncache: reimage cp2013 as text_ats
operations/puppet : productioncache: reimage cp2012 as text_ats
operations/puppet : productioncache: reimage cp2010 as text_ats
operations/puppet : productioncache: reimage cp2007 as text_ats
operations/puppet : productioncache: reimage cp2006 as text_ats
operations/puppet : productioncache: reimage cp2004 as text_ats
operations/puppet : productioncache_text codfw: read ats-be etcd keys
operations/puppet : productioncache: reimage cp2001 as text_ats
operations/puppet : productionATS: network settings for ats-be
operations/puppet : productioncache: reimage cp3064 as text_ats
operations/puppet : productioncache: reimage cp3062 as text_ats
operations/puppet : productioncache: reimage cp3060 as text_ats
operations/puppet : productioncache: reimage cp3058 as text_ats
operations/puppet : productioncache: reimage cp3054 as text_ats
operations/puppet : productioncache: reimage cp3052 as text_ats
operations/puppet : productionATS: move backend::storage_elements settings to profile
operations/puppet : productionATS: use nvme disk for cp3050 ats-be cache
operations/puppet : productioncache_text esams: read ats-be etcd keys
operations/puppet : productioncache: reimage cp3050 as text_ats
operations/puppet : productionATS: remap stream.wmo.org requests on ats-tls as well
operations/puppet : productionATS: remap stream.wm.org websocket requests
operations/puppet : productioncumin: aliases: cache::text_ats is a thing now
operations/puppet : productioncache: reimage cp5012 as text_ats
operations/puppet : productioncache: reimage cp5011 as text_ats
operations/puppet : productioncache: reimage cp5010 as text_ats
operations/puppet : productionvarnish: make hitrate dstat plugin work w/o varnish-be
operations/puppet : productioncache: reimage cp5009 as text_ats
operations/puppet : productioncache: reimage cp5008 as text_ats
operations/puppet : productioncache_text eqsin: read ats-be etcd keys
operations/puppet : productioncache: reimage cp5007 as text_ats
operations/puppet : productionprometheus: load text_ats varnish targets
operations/puppet : productionprometheus: add text_ats mtail targets
operations/puppet : productioncache: reimage cp4032 as text_ats
operations/puppet : productioncache: reimage cp4031 as text_ats
operations/puppet : productioncache: reimage cp4030 as text_ats
operations/dns : masterkibana: add discovery record
operations/dns : masterkibana: add discovery record
operations/puppet : productioncache: reimage cp4029 as text_ats
operations/puppet : productioncache: reimage cp4028 as text_ats
operations/puppet : productioncache_text ulsfo: read ats-be etcd keys
operations/puppet : productioncache: reimage cp4027 as text_ats
operations/puppet : productionATS: include tls profile in cache::text_ats role
operations/puppet : productionATS: Vary-slotting for PHP7
operations/puppet : productionATS: cache responses to cookies
operations/puppet : productionATS: log Cookie in labs too
operations/puppet : productionATS: add X-ATS-Timestamp
operations/puppet : productionATS: log Cookies
operations/puppet : productionATS: perform MW and RB mangling after cache lookup
operations/puppet : productionRevert "ATS: temporarily use plain HTTP to access docker-registry"
operations/puppet : productionATS: temporarily use plain HTTP to access docker-registry
operations/puppet : productionphabricator::main: whitelist ATS hosts
operations/puppet : productiondocker_registry_ha: allow eqiad/codfw varnish/ATS text nodes
operations/puppet : productionATS: get rid of alternate_domains not overriding caching
operations/puppet : productionprometheus: fetch cache_text atsmtail@backend metrics
operations/puppet : productioncache_text eqiad: read ats-be etcd keys
operations/puppet : productioncache: ATS storage configuration for cp1075
operations/puppet : productioncache: convert cp1075 to text_ats (hiera/conftool)
operations/puppet : productioncache: reimage cp1075 as text_ats
operations/puppet : productionATS: enable compress.so everywhere
operations/puppet : productionATS: add icinga check for traffic_server restarts
operations/puppet : productionATS: add icinga check for traffic_server restarts
operations/puppet : productionATS: enable compress.so for upload@eqsin
operations/puppet : productionRevert "ATS: unset Accept-Encoding"
operations/puppet : productionRevert "ATS: leave AE removal to Lua"
operations/puppet : productionATS: compress.so only cache compressed/decompressed variant
operations/puppet : productionRevert "Revert "ATS: enable compress plugin on cp5002""
operations/puppet : productionRevert "ATS: enable compress plugin on cp5002"
operations/puppet : productionATS: use proper origin for grafana.wm.org
operations/puppet : productionATS: leave AE removal to Lua
operations/puppet : productionATS: enable compress plugin on cp5002
operations/puppet : productionATS: set minimum-content-length for compress plugin
operations/puppet : productionATS: unset Accept-Encoding
operations/puppet : productionATS: disable compress plugin
operations/puppet : productionATS: add remap rule bugs.wikimedia.org -> phabricator
operations/puppet : productionATS: add profile::base::nameservers
operations/puppet : productionATS: add prometheus::varnishkafka_exporter::config
operations/puppet : productionATS: add {upload,maps}_domain to text_ats settings
operations/puppet : productionATS: unify common trafficserver settings
operations/puppet : productionATS: add support for the compress plugin and enable it
operations/puppet : productionATS: save and restore CC/Expires when forcing no-cache
operations/puppet : productionATS: do not cache Authorization responses
operations/puppet : productionATS: Vary-slotting for X-Forwarded-Proto
operations/puppet : productionATS: add-vary Lua plugin
operations/puppet : productionATS: w.wiki rewrite to meta
operations/puppet : productionATS: gracefully fail request coalescing
operations/puppet : productionATS: do not cache responses to cookies
operations/puppet : productionATS: split the cache for beta variant of the mobile site
operations/puppet : productioncache: add role::cache::text_ats

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 552245 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] otrs/phabricator: do not assume text nodes are defined

https://gerrit.wikimedia.org/r/552245

Completed auto-reimage of hosts:

['cp1077.eqiad.wmnet']

and were ALL successful.

Change 552245 merged by Ema:
[operations/puppet@production] otrs/phabricator: do not assume text nodes are defined

https://gerrit.wikimedia.org/r/552245

Mentioned in SAL (#wikimedia-operations) [2019-11-21T13:59:50Z] <ema> pool cp1077 with ATS backend T227432

Change 552273 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1079 as text_ats

https://gerrit.wikimedia.org/r/552273

Mentioned in SAL (#wikimedia-operations) [2019-11-21T14:49:32Z] <ema> depool cp1079 and reimage as text_ats T227432

Change 552273 merged by Ema:
[operations/puppet@production] cache: reimage cp1079 as text_ats

https://gerrit.wikimedia.org/r/552273

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1079.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911211453_ema_89543.log.

Completed auto-reimage of hosts:

['cp1079.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-21T15:30:14Z] <ema> pool cp1079 with ATS backend T227432

Change 552468 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1081 as text_ats

https://gerrit.wikimedia.org/r/552468

Mentioned in SAL (#wikimedia-operations) [2019-11-22T08:49:52Z] <ema> depool cp1081 and reimage as text_ats T227432

Change 552468 merged by Ema:
[operations/puppet@production] cache: reimage cp1081 as text_ats

https://gerrit.wikimedia.org/r/552468

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1081.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911220851_ema_38517.log.

Completed auto-reimage of hosts:

['cp1081.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-22T09:28:10Z] <ema> pool cp1081 with ATS backend T227432

Change 552547 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] late_command: remove cpNNNN mkfs stuff

https://gerrit.wikimedia.org/r/552547

Change 552547 merged by BBlack:
[operations/puppet@production] late_command: remove cpNNNN mkfs stuff

https://gerrit.wikimedia.org/r/552547

Change 552825 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "cache: reimage cp3064 as text_ats"

https://gerrit.wikimedia.org/r/552825

Mentioned in SAL (#wikimedia-operations) [2019-11-25T14:45:58Z] <ema> depool cp3064 and reimage with varnish-be T227432

Change 552825 merged by Ema:
[operations/puppet@production] Revert "cache: reimage cp3064 as text_ats"

https://gerrit.wikimedia.org/r/552825

Mentioned in SAL (#wikimedia-operations) [2019-11-25T15:49:46Z] <ema> pool cp3064 with varnish-be T227432

Change 552862 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: do not coalesce uncacheable requests

https://gerrit.wikimedia.org/r/552862

Gilles added a subscriber: Gilles.Nov 26 2019, 1:30 PM
This comment was removed by Gilles.

Change 552076 merged by Ema:
[operations/puppet@production] ATS: explicitly skip the cache instead of hiding CC

https://gerrit.wikimedia.org/r/552076

Change 552862 merged by Ema:
[operations/puppet@production] ATS: do not coalesce uncacheable requests

https://gerrit.wikimedia.org/r/552862

Change 553123 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "ATS: explicitly skip the cache instead of hiding CC"

https://gerrit.wikimedia.org/r/553123

Change 553125 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "ATS: do not coalesce uncacheable requests"

https://gerrit.wikimedia.org/r/553125

Change 553125 merged by Ema:
[operations/puppet@production] Revert "ATS: do not coalesce uncacheable requests"

https://gerrit.wikimedia.org/r/553125

Change 553123 merged by Ema:
[operations/puppet@production] Revert "ATS: explicitly skip the cache instead of hiding CC"

https://gerrit.wikimedia.org/r/553123

Change 553132 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: disable coalescing for some uncacheable requests

https://gerrit.wikimedia.org/r/553132

Change 554256 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1083 as text_ats

https://gerrit.wikimedia.org/r/554256

Mentioned in SAL (#wikimedia-operations) [2019-12-03T10:01:22Z] <ema> depool cp1083 and reimage as text_ats T227432

Change 554256 merged by Ema:
[operations/puppet@production] cache: reimage cp1083 as text_ats

https://gerrit.wikimedia.org/r/554256

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1083.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912031003_ema_133779.log.

Completed auto-reimage of hosts:

['cp1083.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-12-03T10:37:32Z] <ema> pool cp1083 with ATS backend T227432

Change 555396 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: mark uncacheable responses as 'pass' in X-Cache-Int

https://gerrit.wikimedia.org/r/555396

Change 553132 merged by Ema:
[operations/puppet@production] ATS: pass uncacheable requests

https://gerrit.wikimedia.org/r/553132

Change 555396 abandoned by Ema:
ATS: mark uncacheable responses as 'pass' in X-Cache-Int

Reason:
The idea here is wrong. Thanks to hit-for-pass, Varnish can skip cache lookups and coalescing for responses that are known to be uncacheable. In ATS hfp does not exist, hence it would be wrong to label as "pass" things for which we did lookup the cache.

https://gerrit.wikimedia.org/r/555396

Change 556185 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: improve session/token match

https://gerrit.wikimedia.org/r/556185

Change 556185 merged by Ema:
[operations/puppet@production] ATS: improve session/token match

https://gerrit.wikimedia.org/r/556185

Change 556197 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: test setup for default.lua

https://gerrit.wikimedia.org/r/556197

Change 556197 merged by Ema:
[operations/puppet@production] ATS: test setup for default.lua

https://gerrit.wikimedia.org/r/556197

Change 556201 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: use set_server_resp_no_store, do not hide CC

https://gerrit.wikimedia.org/r/556201

Change 556217 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: lookup cache for cookie requests

https://gerrit.wikimedia.org/r/556217

Mentioned in SAL (#wikimedia-operations) [2019-12-11T09:25:12Z] <ema> cp1075: depool ats-be to test set_server_resp_no_store https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/556201/ T227432

Change 556201 merged by Ema:
[operations/puppet@production] ATS: use set_server_resp_no_store, do not hide CC

https://gerrit.wikimedia.org/r/556201

Mentioned in SAL (#wikimedia-operations) [2019-12-11T09:44:58Z] <ema> cp1075: repool ats-be after successful set_server_resp_no_store test P9849 T227432

Mentioned in SAL (#wikimedia-operations) [2019-12-11T10:03:10Z] <ema> cp-ats: apply set_server_resp_no_store patch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/556201/ to all hosts T227432

Change 556217 merged by Ema:
[operations/puppet@production] ATS: lookup cache for cookie requests

https://gerrit.wikimedia.org/r/556217

Change 559356 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1085 as text_ats

https://gerrit.wikimedia.org/r/559356

Change 559357 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1087 as text_ats

https://gerrit.wikimedia.org/r/559357

Change 559358 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "Revert "cache: reimage cp3064 as text_ats""

https://gerrit.wikimedia.org/r/559358

Change 559359 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Revert "Revert "cache: reimage cp2023 as text_ats""

https://gerrit.wikimedia.org/r/559359

Mentioned in SAL (#wikimedia-operations) [2019-12-19T07:53:23Z] <ema> depool cp1085 and reimage as text_ats T227432

Change 559356 merged by Ema:
[operations/puppet@production] cache: reimage cp1085 as text_ats

https://gerrit.wikimedia.org/r/559356

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1085.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190755_ema_74472.log.

Completed auto-reimage of hosts:

['cp1085.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-12-19T08:31:18Z] <ema> pool cp1085 with ATS backend T227432

Mentioned in SAL (#wikimedia-operations) [2019-12-19T08:55:23Z] <ema> depool cp1087 and reimage as text_ats T227432

Change 559357 merged by Ema:
[operations/puppet@production] cache: reimage cp1087 as text_ats

https://gerrit.wikimedia.org/r/559357

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1087.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190859_ema_88921.log.

Mentioned in SAL (#wikimedia-operations) [2019-12-19T09:23:07Z] <ema> depool cp3064 and reimage as text_ats T227432

Completed auto-reimage of hosts:

['cp1087.eqiad.wmnet']

and were ALL successful.

Change 559358 merged by Ema:
[operations/puppet@production] Revert "Revert "cache: reimage cp3064 as text_ats""

https://gerrit.wikimedia.org/r/559358

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3064.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190926_ema_95045.log.

Mentioned in SAL (#wikimedia-operations) [2019-12-19T09:39:19Z] <ema> pool cp1087 with ATS backend T227432

Completed auto-reimage of hosts:

['cp3064.esams.wmnet']

Of which those FAILED:

['cp3064.esams.wmnet']

Mentioned in SAL (#wikimedia-operations) [2019-12-19T10:50:42Z] <ema> pool cp3064 with ATS backend T227432

Change 559440 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp1089 as text_ats

https://gerrit.wikimedia.org/r/559440

Mentioned in SAL (#wikimedia-operations) [2019-12-19T12:52:29Z] <ema> depool cp2023 and cp1089 for ATS reimages T227432. Reimaged together because of T238817

Change 559359 merged by Ema:
[operations/puppet@production] Revert "Revert "cache: reimage cp2023 as text_ats""

https://gerrit.wikimedia.org/r/559359

Change 559440 merged by Ema:
[operations/puppet@production] cache: reimage cp1089 as text_ats

https://gerrit.wikimedia.org/r/559440

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp2023.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912191258_ema_139094.log.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1089.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912191259_ema_139391.log.

Completed auto-reimage of hosts:

['cp1089.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['cp2023.codfw.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-12-19T13:33:03Z] <ema> pool cp2023 with ATS backend T227432

Mentioned in SAL (#wikimedia-operations) [2019-12-19T13:34:01Z] <ema> pool cp1089 with ATS backend T227432

ema closed this task as Resolved.Dec 19 2019, 2:10 PM
ema claimed this task.
ema updated the task description. (Show Details)

cp2023 and cp1089 were the last two hosts running Varnish as backend cache. We now have exclusively ats-be across the fleet!

Change 577551 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: unset client req Accept-Encoding on ats-be

https://gerrit.wikimedia.org/r/577551

Change 577551 merged by Ema:
[operations/puppet@production] ATS: unset client req Accept-Encoding on ats-be

https://gerrit.wikimedia.org/r/577551