Injest Toolhub custom prometheus metrics
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	bd808
	Apr 11 2022, 10:41 PM

Description

Toolhub exposes a number of custom metrics at a /metrics endpoint which are produced by https://github.com/korfuri/django-prometheus. These metrics should be scraped from each pod in the Kubernetes deployments so that they can be used in grafana and other reporting.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		bd808	T305899 Improve grafana dashboard for monitoring Toolhub in production
		Invalid		None	T305902 Injest Toolhub custom prometheus metrics

Event Timeline

bd808 created this task.Apr 11 2022, 10:41 PM

bd808 triaged this task as Medium priority.Apr 11 2022, 10:44 PM

bd808 added a project: observability.

bd808 moved this task from Backlog to Groomed/Ready on the Toolhub board.

How can one get started working on this @bd808 ? any pointers? especially the kubernetes scraping part

In T305902#7849062, @Raymond_Ndibe wrote:

How can one get started working on this @bd808 ? any pointers? especially the kubernetes scraping part

This is yet another area where Toolhub is an early adopter and there does not yet seem to be strong documentation on how to proceed. I think the answer is going to be something like "make a patch to operations/puppet.git". https://github.com/wikimedia/puppet/blob/production/modules/profile/files/prometheus/rules_k8s.yml might be the right place, but I'm not sure.

I would recommend trying to contact folks like @fgiunchedi from SRE Observability or @akosiaris from serviceops to ask for some advice.

In T305902#7849329, @bd808 wrote:

In T305902#7849062, @Raymond_Ndibe wrote:

How can one get started working on this @bd808 ? any pointers? especially the kubernetes scraping part

This is yet another area where Toolhub is an early adopter and there does not yet seem to be strong documentation on how to proceed. I think the answer is going to be something like "make a patch to operations/puppet.git". https://github.com/wikimedia/puppet/blob/production/modules/profile/files/prometheus/rules_k8s.yml might be the right place, but I'm not sure.

I would recommend trying to contact folks like @fgiunchedi from SRE Observability or @akosiaris from serviceops to ask for some advice.

Not sure I see how Toolhub could be an early adopter of metrics scraping. We 've been doing this since day 1, so ~2017. That being said docs haven't been great indeed. There was some stuff at https://wikitech.wikimedia.org/wiki/Prometheus/statsd_k8s but I 've gone ahead and created https://wikitech.wikimedia.org/wiki/Kubernetes/Metrics for better overview as well as placement in the main portal of our docs.

Toolhub wise, metrics are already being injected for a pretty long time now. The helm chart has the prometheus.io/scrape: true annotation so prometheus has been scraping workloads/pods since day 1 of deployment. This is pretty clear in https://w.wiki/53ic where one can see that we are scraping since Oct 27th.

I am gonna close this as invalid, but feel free to reopen

akosiaris mentioned this in T305899: Improve grafana dashboard for monitoring Toolhub in production.Apr 13 2022, 11:08 AM

In T305902#7850997, @akosiaris wrote:

Not sure I see how Toolhub could be an early adopter of metrics scraping.

My assumption was that we were an early adopter of running in k8s as anything other than a nodejs service and as such would need to do something to trigger integration.

Toolhub wise, metrics are already being injected for a pretty long time now. The helm chart has the prometheus.io/scrape: true annotation so prometheus has been scraping workloads/pods since day 1 of deployment.

I'm pretty sure that that annotation is only there because it was in the scaffolding. And magically I mounted the exporter at the correct /metrics endpoint. Convention over configuration is awesome. Thanks for making this part easy. :)

Injest Toolhub custom prometheus metricsClosed, InvalidPublicActions

Description

Related ObjectsSearch...

Event Timeline

Injest Toolhub custom prometheus metrics
Closed, InvalidPublic
Actions

Related Objects
Search...