Toolhub exposes a number of custom metrics at a /metrics endpoint which are produced by https://github.com/korfuri/django-prometheus. These metrics should be scraped from each pod in the Kubernetes deployments so that they can be used in grafana and other reporting.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | bd808 | T305899 Improve grafana dashboard for monitoring Toolhub in production | |||
Invalid | None | T305902 Injest Toolhub custom prometheus metrics |
Event Timeline
How can one get started working on this @bd808 ? any pointers? especially the kubernetes scraping part
This is yet another area where Toolhub is an early adopter and there does not yet seem to be strong documentation on how to proceed. I think the answer is going to be something like "make a patch to operations/puppet.git". https://github.com/wikimedia/puppet/blob/production/modules/profile/files/prometheus/rules_k8s.yml might be the right place, but I'm not sure.
I would recommend trying to contact folks like @fgiunchedi from SRE Observability or @akosiaris from serviceops to ask for some advice.
Not sure I see how Toolhub could be an early adopter of metrics scraping. We 've been doing this since day 1, so ~2017. That being said docs haven't been great indeed. There was some stuff at https://wikitech.wikimedia.org/wiki/Prometheus/statsd_k8s but I 've gone ahead and created https://wikitech.wikimedia.org/wiki/Kubernetes/Metrics for better overview as well as placement in the main portal of our docs.
Toolhub wise, metrics are already being injected for a pretty long time now. The helm chart has the prometheus.io/scrape: true annotation so prometheus has been scraping workloads/pods since day 1 of deployment. This is pretty clear in https://w.wiki/53ic where one can see that we are scraping since Oct 27th.
I am gonna close this as invalid, but feel free to reopen
My assumption was that we were an early adopter of running in k8s as anything other than a nodejs service and as such would need to do something to trigger integration.
Toolhub wise, metrics are already being injected for a pretty long time now. The helm chart has the prometheus.io/scrape: true annotation so prometheus has been scraping workloads/pods since day 1 of deployment.
I'm pretty sure that that annotation is only there because it was in the scaffolding. And magically I mounted the exporter at the correct /metrics endpoint. Convention over configuration is awesome. Thanks for making this part easy. :)