Per the parent task (T147101), figure out what it would take to capture stacktrace profiles from Varnish code (preserving symbols, especially those that come from our own VCL code).
Producing flame graphs from these periodically (e.g. hourly/daily) would allow operations and other VCL code contributors to better understand the performance impact of their changes, and help identify areas where perf has regressed, and help find areas where most time is spent and would benefit most from optimisations.
Assuming that the capturing of such profiles impacts performance itself, we'll need to limit it somehow. Some ideas to consider:
- Limit instrumentation to only one (or other subset) of the Varnish frontends and backends.
- Sample instrumentation within an instance (if possible), e.g. only for one-in-many requests, or for limited periods of time (e.g. a run-time way of toggling it on and off periodically, e.g. couple of seconds every minute or some such).
- Perform instrumentation in a way that doesn't hinder execution (separate observer thread/process that takes periodic snapshots, thus passively sampling to just stacktraces from those moments in time).
The latter is what we do with Xenon/HHVM in production.