Enable self-service Prometheus configuration management for project administrators
Open, MediumPublic
Actions

Assigned To

None

Authored By

	taavi
	Jun 15 2021, 1:04 PM

Description

Project administrators should be able to configure Prometheus scrape targets and alert rules for their project without making changes to operations/puppet. In the long term there are two optimal ways to achieve this this that I can see:

Enable management via Hiera/Puppet
- Pro: Nice to deal with in a project that is otherwise managed with Puppet
- Con (?): Difficult to use - is Hiera easy enough for the target audience?
- Con: Difficult to get proper authentication done
- Con: how to deal with services that are not bound to a single VM - take Kubernetes pods for example
Create a web UI/Horizon interface
- Pro: Ease of use
- Con: Harder to get something like "Scrape all Toolforge Redis hosts on port X" automated
- Con: Requires manual clicking for large projects managed with Puppet
- Con: either have to deal with developer account authentication on cloud realm or have a prod-cloud connection

Bonus points if the solution can automatically make sure the required security group rules are present.

My short-term plan is to create a tool that you can customize with per-project config files ("Scrape all Toolforge Redis hosts on port X", "Alerting rule Y is there") and that creates full configuration for Prometheus and Alertmanager. It's rather bare-bones, but it's better than the current static configuration and gives us a good foundation to continue development, for example to add a database and api to modify rules.

Related Objects
Search...

Status	Assigned	Task
Open	None	T284993 Enable self-service Prometheus configuration management for project administrators
Open	None	T286302 Enable external API access to metricsinfra Prometheus configuration
Resolved	taavi	T286300 Add a database-backed backend for prometheus-configurator
Resolved	taavi	T286299 Create initial scaffolding for Prometheus configuration automation
Open	None	T288052 Figure out how to deploy metricsinfra Prometheus configuration tooling
Open	None	T288058 Add support for more scrape targets configuration types
Resolved	taavi	T288059 metricsinfra: Add support for static service discovery
Open	None	T288060 metricsinfra: Add support for Kubernetes service discovery
Open	None	T288061 metricsinfra: Add client TLS support for scrapes
Resolved	taavi	T288067 metricsinfra: Add hosted Prometheus blackbox exporter
Resolved	taavi	T347277 metricsinfra: add support dns blackbox scrapes
Open	None	T288166 metricsinfra: Alert routing configuration
Open	None	T128715 Add all Cloud VPS project administrators to the Prometheus notification group for each project

Event Timeline

taavi created this task.Jun 15 2021, 1:04 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 15 2021, 1:04 PM

taavi claimed this task.Jun 16 2021, 5:22 PM

Restricted Application added a project: User-Majavah. · View Herald TranscriptJun 16 2021, 5:22 PM

taavi moved this task from Unsorted to Working on on the User-Majavah board.Jun 16 2021, 5:22 PM

taavi triaged this task as Medium priority.Jul 7 2021, 5:54 PM

taavi closed subtask T286300: Add a database-backed backend for prometheus-configurator as Resolved.Aug 3 2021, 6:27 PM

taavi added a subtask: T288166: metricsinfra: Alert routing configuration.Aug 4 2021, 7:57 PM

• nnikkhoui subscribed.Sep 14 2021, 9:22 PM

dcaro subscribed.Mar 3 2022, 9:00 AM

taavi closed subtask T286299: Create initial scaffolding for Prometheus configuration automation as Resolved.Jun 1 2022, 1:28 PM

Not actively working on this.

taavi removed a parent task: T266050: Build Prometheus service for use by all Cloud VPS projects and their instances.Sep 28 2024, 1:18 PM

fnegri added a project: Epic.Nov 14 2024, 11:36 AM

Restricted Application added a project: cloud-services-team. · View Herald TranscriptNov 14 2024, 11:36 AM

Enable self-service Prometheus configuration management for project administratorsOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Enable self-service Prometheus configuration management for project administrators
Open, MediumPublic
Actions

Related Objects
Search...