Page MenuHomePhabricator

Envoy telemetry not available for cirrus-streaming-updater@staging-eqiad
Closed, ResolvedPublic

Description

Envoy telemetry is not available for a flink application running in wikikube@staging.
The service mesh works as expected, it's just that we don't find the namespace in the dashboard https://grafana-rw.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s.
Looking at specific metrics like the envoy_cluster_upstream_rq{kubernetes_namespace="cirrus-streaming-updater"} we can't find anything related to envoy and this namespace in eqiad prometheus@k8s-staging.
Similar services using the same flink-app chart (rdf-streaming-updater and mw-page-content-change-enrich) do appear to have their envoy metrics properly propagated to prometheus.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I'm not sure why it works for the other two. Prometheus does have established tcp connections to pods from mw-p-c-c-e but I can't create new ones because there is no networkpolicy that allows ingress traffic on port 1667. Maybe this is because of a recent networkpolicy change (existing connections are not effected by policy changes).

The flink-app chart should include the mesh.networkpolicy.ingress template in networkpolicy.yaml to allow connections to the mesh.telemetry.port.

Change 982434 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] flink-app: include mesh.networkpolicy.ingress

https://gerrit.wikimedia.org/r/982434

@JMeybohm thanks for taking a look! we'll include this template to see if this solves the issue.

Change 982434 merged by jenkins-bot:

[operations/deployment-charts@master] flink-app: include mesh.networkpolicy.ingress

https://gerrit.wikimedia.org/r/982434

Confirming that envoy metrics are now properly flowing to prometheus for the cirrus-streaming-updater namespace

Gehel claimed this task.