We'd like to have automated monitoring and alerts configured for flink based enrichment jobs.
[] Adapt and augment the existent [[ https://grafana.wikimedia.org/d/gCFgfpG7k/flink-cluster?var-datasource=thanos&var-namespace=rdf-streaming-updater&orgId=1 | Flink Cluster grafana dashboard ]] for more basic flink-app monitoringa shared flink app, pyflink, enrichment dashboard usable by Search and Event Platform. This should include metrics about latency, lag, throughput, memory usage, etc.
[x] If needed, add missing [[ https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ | metrics ]] to enrichment flink apps
[] Make a new Flink Enrichment App grafana dashboard that includes metrics about enrichment latency and lag, message throughput, etc.
[] Define some
Another task will be about defining alerts (aliveness, latency, lag, throughput, etc.) for mediawiki-page-content-change-enrichment job. If we can do this more generically for any enrichment job, we should, but TBD how easy that is.