Envoy telemetry is not available for a flink application running in wikikube@staging.
The service mesh works as expected, it's just that we don't find the namespace in the dashboard https://grafana-rw.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s.
Looking at specific metrics like the envoy_cluster_upstream_rq{kubernetes_namespace="cirrus-streaming-updater"} we can't find anything related to envoy and this namespace in eqiad prometheus@k8s-staging.
Similar services using the same flink-app chart (rdf-streaming-updater and mw-page-content-change-enrich) do appear to have their envoy metrics properly propagated to prometheus.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
flink-app: include mesh.networkpolicy.ingress | operations/deployment-charts | master | +2 -1 |
Event Timeline
I'm not sure why it works for the other two. Prometheus does have established tcp connections to pods from mw-p-c-c-e but I can't create new ones because there is no networkpolicy that allows ingress traffic on port 1667. Maybe this is because of a recent networkpolicy change (existing connections are not effected by policy changes).
The flink-app chart should include the mesh.networkpolicy.ingress template in networkpolicy.yaml to allow connections to the mesh.telemetry.port.
Change 982434 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] flink-app: include mesh.networkpolicy.ingress
@JMeybohm thanks for taking a look! we'll include this template to see if this solves the issue.
Change 982434 merged by jenkins-bot:
[operations/deployment-charts@master] flink-app: include mesh.networkpolicy.ingress
Confirming that envoy metrics are now properly flowing to prometheus for the cirrus-streaming-updater namespace