Visualise Kubernetes CronJobs to track their execution duration and resource usage (CPU, memory) to monitor and identify performance issues
why?
- long running jobs: monitor runtime and failures of crons
- resource monitoring: monitor crons get the resources they need or if they are exhausting their limits
- incident response: easily detect if a job is causing service disruption (eg traffic toweards mw-parsoid or api-int
what?
- https://grafana-rw.wikimedia.org/d/f8cb4b3b-8db9-4446-afa7-c293839964b8/cronjobs-mediawiki-mw-cron?orgId=1
- Include Panels for mw-{api-int, parsoid} traffic and latency to easily spot if there are correlations
- Include in docs