Switching production traffic to Apache Traffic Server

The plan to replace Varnish as the on-disk HTTP cache component of our CDN with Apache Traffic Server is starting to take shape.

On March 13th we have served the very first production requests via Apache Traffic Server (ATS). The test, which lasted about one hour, consisted in sending a little less than 10% of the frontend cache misses served by the Dallas (USA) data center through ATS instead of Varnish. In other words, about 70 requests per second served by the upload cache cluster in Dallas, images and videos from Wikimedia Commons mostly, were being served by Traffic Server. Check out the graphs if you're interested in seeing the details of how much traffic was handled during this initial test. The primary objective of this activity was to verify that the procedures to switch traffic back and forth between Varnish and ATS would work as expected, which they did.

One day later, on March 14th, we once again flipped the switch and served traffic via ATS, this time for a more extended period, about two days. This second test allowed to discover an issue with the way RAM cache is handled in the default configuration of ATS which we fixed in our installation and reported to the Apache Traffic Server project. We then stopped the experiment during the 16th-17th March weekend because we like our weekends to be uneventful, and resumed again on Monday 18th.

From March 18th onward ATS served production traffic continuously. The amount of traffic sent to ATS in Dallas was increased on March 22 to roughly 120 requests per second. Using ATS as part of our normal daily operations allows us to gather invaluable production experience with the system and our procedures around it, as well as discovering potential issues.

On March 25th we started the experiment in our data center in Virginia, USA. In this case, peaks of around 300 requests per second are handled by Apache Traffic Server.

ATS-2019-03-25-eqiad.png (365×2 px, 171 KB)

Given the encouraging results of this initial deployment, we are going to serve more and more production traffic via ATS starting next quarter, so stay tuned!

Written by ema on Mar 28 2019, 9:21 AM.
Staff Site Reliability Engineer, Traffic Team
"Orange Medal" token, awarded by chasemp."Love" token, awarded by Krinkle."Love" token, awarded by Ladsgroup."Heartbreak" token, awarded by Southparkfan."Love" token, awarded by Gilles.

Event Timeline