Page MenuHomePhabricator

[SPIKE] Assess impact of Move analytics log from Varnish to HAProxy
Closed, ResolvedPublic3 Estimated Story Points

Description

How will our pipelines and downstream data products be affect by T351117: Move analytics log from Varnish to HAProxy?
Will anything other than web requests be affected?
What are the risks?
What are the recommended next steps?

Due date: January 31st, 2024

Details

Due Date
Jan 31 2024, 5:00 AM

Event Timeline

VirginiaPoundstone set Due Date to Jan 31 2024, 5:00 AM.

For the record, had a chat about this work with @Milimetric. I'll spend some time tomorrow getting up to speed with the details of T351117.

xcollazo set the point value for this task to 3.Jan 10 2024, 12:43 PM

How will our pipelines and downstream data products be affect by T351117: Move analytics log from Varnish to HAProxy?
In theory, T351117 will switch the logging infrastructure in a way that is transparent to downstream consumers. Thus, in theory, all we'd need would be to switch our webrequest pipelines to start consuming from the proposed new table names discussed in T314956: [Event Platform] Declare webrequest as an Event Platform stream.

Will anything other than web requests be affected?
Yes. In T314956, @Ottomata proposed to also convert the webrequests to proper EventPlatform streams, with well defined schemas. This proposal should have little effect on downstream pipelines though.

What are the risks?
Logs generated by HAProxy may not be equal to those generated by Varnish - Although T351117 aims for backward compatibility, bugs happen and the new logs may have incorrect syntax and/or subtle differences. To mitigate this, we should properly test and vet the newly created log streams. This has been considered in T351117, and the current idea is to write to both the old and new mechanism at the same time for a significant peroid of time (months) so that we can catch these bugs.

The newly generated HAProxy 'sequence numbers' may not be semantically equal to those provided by Varnish - Although HAProxy can provide sequence numbers just like Varnish does, @Milimetric points out on T351117#9400450 that their semantics may not be equal. We depend on these numbers to do calculations to assess whether we have had data loss events. To mitigate this we will have to study, compare and contrast HAProxy's sequence number generation with that of Varnish. This study is not blocked by the implementation of T351117, we can do it separately, and in parallel. I've started that conversation on T351117#9456139.

What are the recommended next steps?

  1. Proactively monitor developments and decisions regarding T351117.
  2. Whenever there is a first implementation of T351117 available to test, we should prioritize testing it right away.

all we'd need would be to switch our webrequest pipelines to start consuming from the proposed new table names discussed in T314956: [Event Platform] Declare webrequest as an Event Platform stream.

As part of the migration, we could begin writing to the same wmf.webrequest table. We don't have to though. Whatever you folks think is best! In T314956 I'm suggesting to name the new Event Platform stream webrequest.frontend. We could refine this to wmf.webrequest_frontend (or whatever) if you like.