Page MenuHomePhabricator

strip non session cookies before cache lookup in ATS
Closed, ResolvedPublic

Assigned To
Authored By
Vgutierrez
Aug 26 2022, 10:44 AM
Referenced Files
F35796472: ats-backend_hitrate-codfw_text-2022-09.png
Fri, Nov 18, 11:33 PM
F35796471: ats-backend_hits-esams_text-2022-09.png
Fri, Nov 18, 11:33 PM
F35796465: ats-ttfb_p75-eqsin_text_2022-09.png
Fri, Nov 18, 11:27 PM
F35707156: Screenshot 2022-11-04 at 20.53.39.png
Fri, Nov 4, 10:05 PM
F35707159: Screenshot 2022-11-04 at 20.52.05.png
Fri, Nov 4, 10:05 PM
F35707157: Screenshot 2022-11-04 at 20.54.07.png
Fri, Nov 4, 10:05 PM
F35707158: Screenshot 2022-11-04 at 20.53.46.png
Fri, Nov 4, 10:05 PM
F35707179: Screenshot 2022-11-04 at 21.47.42.png
Fri, Nov 4, 10:05 PM
Tokens
"Yellow Medal" token, awarded by Ladsgroup.

Description

ATS hitrate is currently being impacted by Vary:Cookie + non session cookies like WMF-Last-Access or GeoIP

Event Timeline

Change 826785 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Hide non session cookies during cache lookup

https://gerrit.wikimedia.org/r/826785

Vgutierrez changed the task status from Open to In Progress.Aug 26 2022, 10:51 AM
Vgutierrez triaged this task as Medium priority.
Vgutierrez moved this task from Triage to In Progress on the Traffic board.

Change 826866 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] varnish: Emit X-Varnish-Cluster for misc sites

https://gerrit.wikimedia.org/r/826866

Change 826866 merged by Vgutierrez:

[operations/puppet@production] varnish: Emit X-Varnish-Cluster for misc sites

https://gerrit.wikimedia.org/r/826866

Mentioned in SAL (#wikimedia-operations) [2022-08-29T08:55:40Z] <vgutierrez> test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338 T316337

Change 826785 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Hide non session cookies during cache lookup

https://gerrit.wikimedia.org/r/826785

Mentioned in SAL (#wikimedia-operations) [2022-08-29T10:09:10Z] <vgutierrez> test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338 T316337

Mentioned in SAL (#wikimedia-operations) [2022-08-29T12:14:20Z] <vgutierrez> rolling restart of ats-be fleet wide to apply "Hide non session cookies during cache lookup" - T316338 T316337

Change 826785 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Hide non session cookies during cache lookup

https://gerrit.wikimedia.org/r/826785

Reverted by https://gerrit.wikimedia.org/r/c/operations/puppet/+/827566 which missed having the Bug: header line needed to be reported here by bots.

Change 828002 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Hide non session cookies during cache lookup

https://gerrit.wikimedia.org/r/828002

Mentioned in SAL (#wikimedia-operations) [2022-08-31T08:12:03Z] <vgutierrez> test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338

Mentioned in SAL (#wikimedia-operations) [2022-08-31T08:20:05Z] <vgutierrez> end test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338

Mentioned in SAL (#wikimedia-operations) [2022-08-31T11:04:09Z] <vgutierrez> test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338

Change 828002 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Hide non session cookies during cache lookup

https://gerrit.wikimedia.org/r/828002

Mentioned in SAL (#wikimedia-operations) [2022-08-31T12:57:22Z] <vgutierrez> test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338

Mentioned in SAL (#wikimedia-operations) [2022-08-31T14:08:50Z] <vgutierrez> deploy trafficserver: Hide non session cookies during cache lookup globally - T316338

Change 828564 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Replace session cookies with Token=1 iff V:C isn't there

https://gerrit.wikimedia.org/r/828564

Change 828564 abandoned by Vgutierrez:

[operations/puppet@production] trafficserver: Replace session cookies with Token=1 iff V:C isn't there

Reason:

we don't need this in ATS-land as it already caches requests with cookies by default. lack of Vary:Cookie in the response is enough

https://gerrit.wikimedia.org/r/828564

Vgutierrez claimed this task.

this seems to be working and not breaking anything :). As a direct result cache hitrate shows up to a 100% increase in the text cluster at the ats layer: https://grafana.wikimedia.org/goto/acc0K6W4z?orgId=1

As a direct result cache hitrate shows up to a 100% increase in the text cluster at the ats layer […]

Images for future reference, as from https://grafana.wikimedia.org/d/O2sTrqZVk/backend-layer-performance?orgId=1&var-site=All&var-cluster=text&from=1661385600000&to=1662840000000. I created an annotation (tagged: operations, performance) with link to this task, to make it easier to correlate on other dashboards.

The latency improvement is huge. In Eqsin latency improved by 25% at the p75, e.g. from 475ms down to 350ms for the same time/day a week earlier. That's a 125ms drop!

ats-ttfb_p75-eqsin_text_2022-09.png (1×2 px, 218 KB)

In terms of cache effectiveness, internal cache hits went up from 600 to 1200 req/s in Esams, and 300 to 600 req/s in Eqsin.

ats-backend_hits-esams_text-2022-09.png (655×2 px, 162 KB) Screenshot 2022-11-04 at 20.54.07.png (765×2 px, 164 KB) Screenshot 2022-11-04 at 20.52.05.png (932×2 px, 210 KB)

Naturally, the reported cache hit is now double in some regions, e.g. from 2% up to 4% to Codfw. Note that our frontend cache hit ratio is and remains way higher at around 89% overall (upto 99.9% for ResourceLoader). This improvement is specifically at ATS backend, our second layer of caching.

ats-backend_hitrate-codfw_text-2022-09.png (575×2 px, 169 KB)