Page MenuHomePhabricator

Experiment with single backend CDN nodes
Open, MediumPublic

Description

We want to evaluate the practical performance implications of using a single, local, cache backend instead of spreading the whole datastet to multiple nodes with c-hashing.

In order to do so, we need to change the current Puppetization to allow taking one host out of c-hash, and use it exclusively as a local backend. This could be done by using a hiera setting, say cache::single_backend_fqdn, and excluding such hostname from the pool of backends for a given cache cluster/DC. Additionally, we need to use only localhost as the cache backend for the host cache::single_backend_fqdn itself.

Dashboards such as cache-hosts-comparison can then be used to observe the impact on hitrate/ttfb on various nodes. By having N-1 backend nodes instead of N there may be some backend hitrate change across the board to take into account when interpreting the results.

The procedure for enabling the experiment on one node ($host) is as follows:

  • Depool the host from all user traffic with sudo -i depool
  • (Optional but good to evaluate hitrate): Stop trafficserver.service, empty ATS backend cache with traffic_server -C clear_cache, start trafficserver.service
  • Set cache::single_backend_fqdn: $host in hiera for the DC/cluster the host is part of (eg: for host=cp4027, ulsfo/text)
  • Run puppet on all cache nodes in the DC/cluster. Ensure that $host is removed from the list of backends on all varnish instances in the DC/cluster with sudo -i varnishadm -n frontend backend.list
  • Ensure that varnish on $host points to localhost and that the node behaves well: https://wikitech.wikimedia.org/wiki/Varnish#Force_your_requests_through_a_specific_Varnish_frontend
  • Repool the host for user traffic with sudo -i pool. Ensure that $host is not listed in /etc/varnish/directors.frontend.vcl on any DC/cluster node

Disabling the experiment:

  • Depool the host from all user traffic with sudo -i depool
  • Unset cache::single_backend_fqdn in hiera for the DC/cluster the host is part of
  • Run puppet on all cache nodes in the DC/cluster. Ensure that $host is added to the list of backends on all varnish instances in the DC/cluster with sudo -i varnishadm -n frontend backend.list
  • Ensure that varnish on $host points to all nodes in the DC/cluster and that the node behaves well: https://wikitech.wikimedia.org/wiki/Varnish#Force_your_requests_through_a_specific_Varnish_frontend
  • Repool the host for user traffic with sudo -i pool. Ensure that $host is listed in /etc/varnish/directors.frontend.vcl on all DC/cluster nodes

Event Timeline

ema triaged this task as Medium priority.Aug 5 2021, 8:51 AM

Change 710224 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: single backend experiment

https://gerrit.wikimedia.org/r/710224

Change 710236 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: refactor dynamic_backend_caches logic

https://gerrit.wikimedia.org/r/710236

Change 710244 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: enable single backend experiment on cp4027

https://gerrit.wikimedia.org/r/710244

Change 710236 merged by Ema:

[operations/puppet@production] cache: refactor dynamic_backend_caches logic

https://gerrit.wikimedia.org/r/710236

Change 710224 merged by Ema:

[operations/puppet@production] cache: single backend experiment

https://gerrit.wikimedia.org/r/710224

Change 710973 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: use confd only if backend list is on etcd

https://gerrit.wikimedia.org/r/710973

Change 710973 merged by Ema:

[operations/puppet@production] cache: use confd only if backend list is on etcd

https://gerrit.wikimedia.org/r/710973

Change 726912 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: exclude single backend experiment from pooled ATS backends

https://gerrit.wikimedia.org/r/726912

Change 726912 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] cache: exclude single backend experiment from pooled ATS backends

https://gerrit.wikimedia.org/r/726912

To document the fact somewhere with syntax highlighting: the patch above changes the Go template when the experiment is not running anywhere as follows:

--- /etc/confd/templates/_etc_varnish_directors.frontend.vcl.tmpl.orig
+++ /etc/confd/templates/_etc_varnish_directors.frontend.vcl.tmpl
@@ -1,7 +1,7 @@
 new cache_local = directors.shard();
 new cache_local_random = directors.random();
 
-{{range $node := ls "/conftool/v1/pools/esams/cache_text/ats-be/"}}{{ $key := printf "/conftool/v1/pools/esams/cache_text/ats-be/%s" $node }}{{ $data := json (getv $key) }}{{ if eq $data.pooled "yes"}}
+{{range $node := ls "/conftool/v1/pools/esams/cache_text/ats-be/"}}{{ $key := printf "/conftool/v1/pools/esams/cache_text/ats-be/%s" $node }}{{ $data := json (getv $key) }}{{ if and (eq $data.pooled "yes") (ne $node "") }}
 cache_local.add_backend(be_{{ $parts := split $node "." }}{{ join $parts "_" }});
 cache_local_random.add_backend(be_{{ $parts := split $node "." }}{{ join $parts "_" }}, {{ $data.weight }});
 {{end}}{{end}}

Change 726912 merged by Ema:

[operations/puppet@production] cache: exclude single backend experiment from pooled ATS backends

https://gerrit.wikimedia.org/r/726912

Mentioned in SAL (#wikimedia-operations) [2021-10-12T10:23:42Z] <ema> depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine T288106