There are a number of analytics services which run on the cache hosts, e.g. multiple instantiations of varnishkafka, varnish varnishprocessor daemons like varnishrls, etc...
Right now, the only runtime service dependencies (as in, configured metadata for systemd) is that they all depend on varnish services. That is, they all have lines like
After=varnish-frontend.service (and also sometimes BindsTo=varnish-frontend.service) in their systemd unit files. This makes sense from a certain perspective: they probably require the varnish instance they pull logs from to already be running, and perhaps they could crash or error if varnish is stopped before them as well.
However, I'm not entirely sure they each require the varnish service they're reading shm logs from to actually be started first, or that they'd error out badly if varnish stopped first. It may be the case that they're capable of starting and stopping asynchronously from the related varnish instance. It would be a nice property to have, and if they don't already have that property, it may be worth investing in some code updates for it.
Because systemd has ultimate control over the parallelism and execution order of all start/stop on boot/shutdown, there's no guarantee that varnish and the analytics daemons' start/stop actions execute in a tight batch in wallclock terms. Therefore with the current dependency scheme, for example, varnishkafka might start several minutes after varnish-frontend does, and varnishkafka might be stopped several minutes before varnish-frontend does as well. This leaves racy timing gaps where legitimate traffic may flow unlogged by these analytics services. In most cases, especially in the past, this is a trivial time offset on a rare event (reboot), so it's not generally a critical issue, but things are changing...
These days we're looking at auto-(de|re)pool scripts hooked into the init system, which pool or depool services in confd and may wait (for now, via over-long sleeps) to ensure those pooling changes take effect before allowing the main varnish (or nginx) service to stop or start. The net result in practice has been a sequence on shutdown like: "stop varnishkafka; depool self from confd; sleep 45 seconds; stop varnish-frontend". This is a legal interpretation of current dependencies, and leaves a more-significant window of unlogged traffic. A further related complication is that we have the same issue with confd dependencies itself, where confd may already be stopped before the depool action, and thus the node doesn't depool itself from inter-service dependecies during the window of depool time....
TL;DR - find out if various analytics daemons are capable of being asynch (in service dep terms) from varnish itself. If they are, or once they are, we need to flip the systemd dependencies around to avoid unlogged traffic windows: e.g. varnish should depend on varnishkafka, so that the logging is always running if the daemon is running.