Page MenuHomePhabricator

High latency on push notification service initialization
Closed, ResolvedPublic

Description

It looks when push notification service gets initialized, grafana shows a high latency. That happens without even having traffic hitting the service.
https://grafana.wikimedia.org/d/NQO_pqvMk/push-notifications?orgId=1&from=1602201152770&to=1602403260466&var-dc=eqiad%20prometheus%2Fk8s&var-service=push-notifications

image.png (606×2 px, 118 KB)

Related Objects

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2020-10-13T10:47:26Z] <jayme> no-change rolling restart of push-notifications in codfw - T265258

A simple restart of the pods in codfw (without any actual change) triggered the same behavior, so I will deploy the update to envoy 1.15.1 (T264157) without this being solved as it is unrelated.

@jijiki Nothing comes in mind for that specific task, given that we still don't use it with production data. We still need to figure out why this is happening on the app level.

My assumption is that the memory leak issue in node-apn (T263058) is triggering high load on CPU when the garbage collector is invoked after the restart. Still investigating though.

@MSantos the memory leak fix is in production for some time now, should we close this one?

This issue hasn't happened recently. Please reopen in case I'm missing something.