Search would like to deploy multiple flink-apps per k8s namespace using helm releases.
However, this will complicate dashboarding and monitoring.
Only the job related metrics have the Flink job_name label in them. Some important task related ones do not.
Example: https://grafana.wikimedia.org/goto/he98JcQVk?orgId=1
In the table of results at the bottom, you can see that neither flink_jobmanager_numRegisteredTaskManagers nor flink_taskmanager_Status_JVM_Memory_Heap_Used have the job_name label.
In the dashboard I've been working on, I've been using kubernetes_namespace to select the job. If we deploy multiple jobs per namespace, we'll need to use something else, and we can't use job_name.
All metrics will have the helm release label in them. We could use that.
If we are going to do this, we need to adjust dashboards to use release as the canonical 'job' name. To do this, we should adopt a convention for all flink-app deployments, and ensure that helm release matches job_name.
Alternatively, perhaps it is possible to configure all scoped flink metrics to include job_name?
Done is
- Adjust Flink Dashboard to use be able to select and use release and job_name in queries where appropriate.