Background
From my notes:
- EventGate failed to parse the X-Experiment-Enrollments header sent by Varnish for approx. 48.71% of the events received
- There were 152 validation errors caused by an otherwise-valid X-Experiment-Enrollments header sent by Varnish not including the required experiment or the required group. I will try to determine whether its one or both of those scenarios
enrolled assigned. n "sds2-4-11-synth-aa-test" "control" 77 "sds2-4-11-synth-aa-test" "control-2" 75 SELECT json_extract(raw_event, '$.experiment.enrolled') AS enrolled, json_extract(raw_event, '$.experiment.assigned') AS assigned, COUNT(*) AS n FROM event.eventgate_analytics_external_error_validation WHERE year = 2025 AND ( (day = 6 AND hour >= 11) OR day >= 7 ) AND errored_stream_name = 'product_metrics.web_base' GROUP BY 1, 2 ORDER BY n DESC ;
Without seeing the value of the X-Experiment-Enrollments header, it's hard to understand precisely what's going on. I propose logging the [raw] value and, if possible, the parsed value without subject IDs as context
AC
- If the header is unparseable, then log an error event to the error stream, e.g. eventgate-analytics-external.error.validation stream.
Notes
- The header may contain subject IDs. Logging subject IDs to the new error stream conveys as much risk as logging them to the product_metrics.web_base stream
- If this mechanism is still enabled when we begin to run a large number of experiments, there is a chance that we might log subject IDs for concurrent experiments. Logging these errors to a separate error stream allows us to purge them manually or automatically on a different schedule
- It would be nice if the stream was named something other than eventgate-analytics-external.error.validation since it will not have more than strictly ValidationErrors. Doing this will be a bit of work. A TODO comment has been added in the code.