Page MenuHomePhabricator

Create replacement for Varnishkafka
Open, MediumPublic

Description

As we all know, the time has come to start thinking about what it is needed to replace the Varnish frontends with ATS. For Analytics, this means replacing varnishkafka.

Varnishkafka currently reads details about a HTTP request from the Varnish shared memory, assembles a JSON string and sends it to Kafka (via librdkafka). In the ATS world, IIUC, a special logger can be configured to send a string (containing data about a HTTP request) to a named pipe, that in turn will be collected and exposed by fifo-log-demux via a local socket. In theory the varnishkafka replacement should do the following:

High level things to reason about:

  • ATS is currently able to produce a string to a certain logger, that can be formatted in any way, even like it was JSON. This work is currently done by Varnishkafka, that after reading from shm explicitly encodes the data collected into valid JSON (taking care of things like escaping etc..). Ideally, to keep things simple, ATS could produce the JSON representation of a HTTP request directly to its logger, and the new tool should only read and deliver to Kafka. It needs to be investigated if this is possible or if some corner cases are left out (say weird escaping etc..).
  • Varnishkafka is currently producing metrics to a JSON log file, containing two kind of data:
    • librdkafka internal metrics (TLS latency, msgs sent, etc..)
    • internal metrics like how many times the librdkafka delivery callback (when data is not delivered to Kafka) has been called

These metrics need to be preserved in the new tool, it is vital for Analytics. The new implementation should produce the same metrics in a way that allows us to distinguish between those of varnishkafka and those of the new system. For example, we currently have rdkafka_producer_msg_cnt{cluster="cache_text", source="webrequest"}. We should add a new label to tell the implementation (eg: software="varnishkafka").

  • The new tool should support Prometheus natively
  • The language to write the new tool with should be something that relies on a strong librdkafka wrapper, or possibly to librdkafka directly (if written in C).
  • The roll-out strategy will need to take into account the fact that the new tool will need to report the same amount of traffic delivered to kafka (compared to varnishkafka). It seems very obvious but we'll need to make sure that the new tool does not contain clear bugs that cause data to be dropped without noticing it (even a 1% of traffic dropped silently for a weird reason will be a big problem for us).

Details

Related Gerrit Patches:
operations/puppet : productionATS: escape hyphens in X-Analytics-TLS patterns
operations/puppet : productionATS: add webrequest logging for atskafka
operations/puppet : productionATS: add X-Analytics-TLS

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptNov 11 2019, 3:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema moved this task from Triage to Caching on the Traffic board.Nov 12 2019, 2:45 PM
ema triaged this task as Medium priority.Nov 12 2019, 4:12 PM
ema added a subscriber: ema.Nov 12 2019, 4:45 PM
ema updated the task description. (Show Details)Tue, Jan 7, 10:33 AM

Change 562535 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add webrequest logging for atskafka

https://gerrit.wikimedia.org/r/562535

Change 562811 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add X-Analytics-TLS

https://gerrit.wikimedia.org/r/562811

Change 562811 merged by Ema:
[operations/puppet@production] ATS: add X-Analytics-TLS

https://gerrit.wikimedia.org/r/562811

Mentioned in SAL (#wikimedia-operations) [2020-01-08T14:23:28Z] <ema> depool cp4028 to test X-Analytics-TLS patch T237993

Mentioned in SAL (#wikimedia-operations) [2020-01-08T14:35:34Z] <ema> repool cp4028 after successful X-Analytics-TLS patch test T237993

Change 562841 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: escape hyphens in X-Analytics-TLS patterns

https://gerrit.wikimedia.org/r/562841

Change 562841 merged by Ema:
[operations/puppet@production] ATS: escape hyphens in X-Analytics-TLS patterns

https://gerrit.wikimedia.org/r/562841