Page MenuHomePhabricator

Replace Varnish backends with ATS on cache text nodes
Open, NormalPublic

Description

This is the tracking task for the conversion of cache_text on-disk caches from Varnish to ATS. See T226589 for the similar, already completed, conversion of cache_upload.

DCs should be converted starting with the outer-most DCs, for example in the following order:

  • ulsfo
  • eqsin
  • esams
  • codfw
  • eqiad

Details

Related Gerrit Patches:
operations/puppet : productioncache: reimage cp3064 as text_ats
operations/puppet : productioncache: reimage cp3062 as text_ats
operations/puppet : productionATS: network settings for ats-be
operations/puppet : productioncache: reimage cp3060 as text_ats
operations/puppet : productioncache: reimage cp3058 as text_ats
operations/puppet : productioncache: reimage cp3054 as text_ats
operations/puppet : productioncache: reimage cp3052 as text_ats
operations/puppet : productionATS: move backend::storage_elements settings to profile
operations/puppet : productionATS: use nvme disk for cp3050 ats-be cache
operations/puppet : productioncache_text esams: read ats-be etcd keys
operations/puppet : productioncache: reimage cp3050 as text_ats
operations/puppet : productionATS: remap stream.wmo.org requests on ats-tls as well
operations/puppet : productionATS: remap stream.wm.org websocket requests
operations/puppet : productioncumin: aliases: cache::text_ats is a thing now
operations/puppet : productioncache: reimage cp5012 as text_ats
operations/puppet : productioncache: reimage cp5011 as text_ats
operations/puppet : productioncache: reimage cp5010 as text_ats
operations/puppet : productionvarnish: make hitrate dstat plugin work w/o varnish-be
operations/puppet : productioncache: reimage cp5009 as text_ats
operations/puppet : productioncache: reimage cp5008 as text_ats
operations/puppet : productioncache_text eqsin: read ats-be etcd keys
operations/puppet : productioncache: reimage cp5007 as text_ats
operations/puppet : productionprometheus: load text_ats varnish targets
operations/puppet : productionprometheus: add text_ats mtail targets
operations/puppet : productioncache: reimage cp4032 as text_ats
operations/puppet : productioncache: reimage cp4031 as text_ats
operations/puppet : productioncache: reimage cp4030 as text_ats
operations/dns : masterkibana: add discovery record
operations/dns : masterkibana: add discovery record
operations/puppet : productioncache: reimage cp4029 as text_ats
operations/puppet : productioncache: reimage cp4028 as text_ats
operations/puppet : productioncache_text ulsfo: read ats-be etcd keys
operations/puppet : productioncache: reimage cp4027 as text_ats
operations/puppet : productionATS: include tls profile in cache::text_ats role
operations/puppet : productionATS: Vary-slotting for PHP7
operations/puppet : productionATS: cache responses to cookies
operations/puppet : productionATS: log Cookie in labs too
operations/puppet : productionATS: add X-ATS-Timestamp
operations/puppet : productionATS: log Cookies
operations/puppet : productionATS: perform MW and RB mangling after cache lookup
operations/puppet : productionRevert "ATS: temporarily use plain HTTP to access docker-registry"
operations/puppet : productionATS: temporarily use plain HTTP to access docker-registry
operations/puppet : productionphabricator::main: whitelist ATS hosts
operations/puppet : productiondocker_registry_ha: allow eqiad/codfw varnish/ATS text nodes
operations/puppet : productionATS: get rid of alternate_domains not overriding caching
operations/puppet : productionprometheus: fetch cache_text atsmtail@backend metrics
operations/puppet : productioncache_text eqiad: read ats-be etcd keys
operations/puppet : productioncache: ATS storage configuration for cp1075
operations/puppet : productioncache: convert cp1075 to text_ats (hiera/conftool)
operations/puppet : productioncache: reimage cp1075 as text_ats
operations/puppet : productionATS: enable compress.so everywhere
operations/puppet : productionATS: add icinga check for traffic_server restarts
operations/puppet : productionATS: add icinga check for traffic_server restarts
operations/puppet : productionATS: enable compress.so for upload@eqsin
operations/puppet : productionRevert "ATS: unset Accept-Encoding"
operations/puppet : productionRevert "ATS: leave AE removal to Lua"
operations/puppet : productionATS: compress.so only cache compressed/decompressed variant
operations/puppet : productionRevert "Revert "ATS: enable compress plugin on cp5002""
operations/puppet : productionRevert "ATS: enable compress plugin on cp5002"
operations/puppet : productionATS: use proper origin for grafana.wm.org
operations/puppet : productionATS: leave AE removal to Lua
operations/puppet : productionATS: enable compress plugin on cp5002
operations/puppet : productionATS: set minimum-content-length for compress plugin
operations/puppet : productionATS: unset Accept-Encoding
operations/puppet : productionATS: disable compress plugin
operations/puppet : productionATS: add remap rule bugs.wikimedia.org -> phabricator
operations/puppet : productionATS: add profile::base::nameservers
operations/puppet : productionATS: add prometheus::varnishkafka_exporter::config
operations/puppet : productionATS: add {upload,maps}_domain to text_ats settings
operations/puppet : productionATS: unify common trafficserver settings
operations/puppet : productionATS: add support for the compress plugin and enable it
operations/puppet : productionATS: save and restore CC/Expires when forcing no-cache
operations/puppet : productionATS: do not cache Authorization responses
operations/puppet : productionATS: Vary-slotting for X-Forwarded-Proto
operations/puppet : productionATS: add-vary Lua plugin
operations/puppet : productionATS: w.wiki rewrite to meta
operations/puppet : productionATS: gracefully fail request coalescing
operations/puppet : productionATS: do not cache responses to cookies
operations/puppet : productionATS: split the cache for beta variant of the mobile site
operations/puppet : productioncache: add role::cache::text_ats

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp5010.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911011127_ema_90428.log.

Completed auto-reimage of hosts:

['cp5010.eqsin.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-01T12:18:36Z] <ema> pool cp5010 with ATS backend T227432

Change 547731 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp5011 as text_ats

https://gerrit.wikimedia.org/r/547731

Mentioned in SAL (#wikimedia-operations) [2019-11-01T14:05:02Z] <ema> depool cp5011 and reimage as text_ats T227432

Change 547731 merged by Ema:
[operations/puppet@production] cache: reimage cp5011 as text_ats

https://gerrit.wikimedia.org/r/547731

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp5011.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911011407_ema_122772.log.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp5011.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911011500_ema_140070.log.

Completed auto-reimage of hosts:

['cp5011.eqsin.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-01T16:37:25Z] <ema> pool cp5011 with ATS backend T227432

Change 547800 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] cumin: aliases: cache::text_ats is a thing now

https://gerrit.wikimedia.org/r/547800

Change 548249 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp5012 as text_ats

https://gerrit.wikimedia.org/r/548249

Mentioned in SAL (#wikimedia-operations) [2019-11-04T13:06:03Z] <ema> depool cp5012 and reimage as text_ats T227432

Change 548249 merged by Ema:
[operations/puppet@production] cache: reimage cp5012 as text_ats

https://gerrit.wikimedia.org/r/548249

Change 547800 merged by CDanis:
[operations/puppet@production] cumin: aliases: cache::text_ats is a thing now

https://gerrit.wikimedia.org/r/547800

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp5012.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911041319_ema_113165.log.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp5012.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911041406_ema_127312.log.

Completed auto-reimage of hosts:

['cp5012.eqsin.wmnet']

Of which those FAILED:

['cp5012.eqsin.wmnet']

Mentioned in SAL (#wikimedia-operations) [2019-11-05T10:59:02Z] <ema> pool cp5012 with ATS backend T227432

ema updated the task description. (Show Details)Tue, Nov 5, 11:00 AM

Change 548747 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: remap stream.wm.org websocket requests

https://gerrit.wikimedia.org/r/548747

Change 548747 merged by Ema:
[operations/puppet@production] ATS: remap stream.wm.org websocket requests

https://gerrit.wikimedia.org/r/548747

Change 548949 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: remap stream.wmo.org requests on ats-tls as well

https://gerrit.wikimedia.org/r/548949

Change 548949 merged by Vgutierrez:
[operations/puppet@production] ATS: remap stream.wmo.org requests on ats-tls as well

https://gerrit.wikimedia.org/r/548949

Change 550105 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3050 as text_ats

https://gerrit.wikimedia.org/r/550105

Change 550106 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache_text esams: read ats-be etcd keys

https://gerrit.wikimedia.org/r/550106

Mentioned in SAL (#wikimedia-operations) [2019-11-11T13:25:40Z] <ema> depool cp3050 and reimage as text_ats T227432

Change 550105 merged by Ema:
[operations/puppet@production] cache: reimage cp3050 as text_ats

https://gerrit.wikimedia.org/r/550105

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3050.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911111327_ema_226775.log.

Completed auto-reimage of hosts:

['cp3050.esams.wmnet']

and were ALL successful.

Change 550106 merged by Ema:
[operations/puppet@production] cache_text esams: read ats-be etcd keys

https://gerrit.wikimedia.org/r/550106

Change 550246 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: use nvme disk for cp3050 ats-be cache

https://gerrit.wikimedia.org/r/550246

Change 550246 merged by Ema:
[operations/puppet@production] ATS: use nvme disk for cp3050 ats-be cache

https://gerrit.wikimedia.org/r/550246

Mentioned in SAL (#wikimedia-operations) [2019-11-11T14:26:41Z] <ema> pool cp3050 with ATS backend T227432

Change 550448 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: move backend::storage_elements settings to role yaml

https://gerrit.wikimedia.org/r/550448

Change 550448 merged by Ema:
[operations/puppet@production] ATS: move backend::storage_elements settings to profile

https://gerrit.wikimedia.org/r/550448

Change 550501 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3052 as text_ats

https://gerrit.wikimedia.org/r/550501

Change 550501 merged by Ema:
[operations/puppet@production] cache: reimage cp3052 as text_ats

https://gerrit.wikimedia.org/r/550501

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3052.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911121619_ema_7588.log.

Completed auto-reimage of hosts:

['cp3052.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-12T17:03:18Z] <ema> pool cp3052 with ATS backend T227432

Change 550688 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3054 as text_ats

https://gerrit.wikimedia.org/r/550688

Mentioned in SAL (#wikimedia-operations) [2019-11-13T15:35:39Z] <ema> depool cp3054 and reimage as text_ats T227432

Change 550688 merged by Ema:
[operations/puppet@production] cache: reimage cp3054 as text_ats

https://gerrit.wikimedia.org/r/550688

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3054.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911131538_ema_16683.log.

Completed auto-reimage of hosts:

['cp3054.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-13T16:21:36Z] <ema> pool cp3054 with ATS backend T227432

Change 550811 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3058 as text_ats

https://gerrit.wikimedia.org/r/550811

Mentioned in SAL (#wikimedia-operations) [2019-11-14T09:34:00Z] <ema> depool cp3058 and reimage as text_ats T227432

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3058.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911140935_ema_224372.log.

Change 550811 merged by Ema:
[operations/puppet@production] cache: reimage cp3058 as text_ats

https://gerrit.wikimedia.org/r/550811

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3058.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911141001_ema_240171.log.

Completed auto-reimage of hosts:

['cp3058.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-14T10:43:03Z] <ema> pool cp3058 with ATS backend T227432

Change 550822 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3060 as text_ats

https://gerrit.wikimedia.org/r/550822

Mentioned in SAL (#wikimedia-operations) [2019-11-14T14:01:53Z] <ema> depool cp3060 and reimage as text_ats T227432

Change 550822 merged by Ema:
[operations/puppet@production] cache: reimage cp3060 as text_ats

https://gerrit.wikimedia.org/r/550822

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3060.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911141403_ema_69463.log.

Completed auto-reimage of hosts:

['cp3060.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-14T14:54:28Z] <ema> pool cp3060 with ATS backend T227432

Change 550849 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3062 as text_ats

https://gerrit.wikimedia.org/r/550849

Change 550850 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: reimage cp3064 as text_ats

https://gerrit.wikimedia.org/r/550850

Change 550866 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: network settings for ats-be

https://gerrit.wikimedia.org/r/550866

Mentioned in SAL (#wikimedia-operations) [2019-11-15T09:50:58Z] <ema> depool cp3062 and reimage as text_ats T227432

Change 550849 merged by Ema:
[operations/puppet@production] cache: reimage cp3062 as text_ats

https://gerrit.wikimedia.org/r/550849

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3062.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911150954_ema_187006.log.

Completed auto-reimage of hosts:

['cp3062.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-11-15T10:36:44Z] <ema> pool cp3062 with ATS backend T227432

Mentioned in SAL (#wikimedia-operations) [2019-11-15T14:25:03Z] <ema> depool cp3064 and reimage as text_ats T227432

Change 550850 merged by Ema:
[operations/puppet@production] cache: reimage cp3064 as text_ats

https://gerrit.wikimedia.org/r/550850

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3064.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201911151428_ema_140812.log.

Completed auto-reimage of hosts:

['cp3064.esams.wmnet']

Of which those FAILED:

['cp3064.esams.wmnet']

Mentioned in SAL (#wikimedia-operations) [2019-11-15T15:11:55Z] <ema> pool cp3064 with ATS backend T227432

ema updated the task description. (Show Details)Fri, Nov 15, 3:16 PM