On T386114 we fixed lock issues related to generating metrics.
However, after fixing those, two separate issues emerged:
and also:
The OOM issue has a simple fix, to be delivered as part of this ticket. However, we need to root cause and fix the silent failures.