Page MenuHomePhabricator

Figure out how to deal with security groups when rolling out metricsinfra scraping
Open, MediumPublic

Description

Currently all projects monitored by the Prometheus instance in metricsinfra have manual security group rules to allow scraping. We'll need a way to automate managing those when rolling out to projects not managed by WMCS staff or active trusted volunteers.

I don't see an option in horizon to permit traffic from a security group on a separate project, but there are a few alternative options that come to my mind:

  • The current monitoring host was given a reserved address in T250206#6056467. Expand that to a larger block of reserved addresses, say a /29 or a /28, (so that we can add redundancy and scale beyond one box) and add a rule that permits traffic from that block to all projects and the defaults for new projects. If it's restricted to the specific node-exporter port, we'd need to tell everyone to manually add rules for scraping non-default targets. This requires a production root to create new Prometheus VMs.
  • Give the configuration tooling in metricsinfra powers to manage security groups in all projects. This is dangerous, but would let us automate most actions.
  • Add the missing feature to Neutron to let us add a security group rule that permits traffic from a security group on a separate project. Add a rule using that to all existing projects and to the defaults of any new projects.

Event Timeline

Majavah updated the task description. (Show Details)

Neutron sg-to-sg firewalling is very much intended not to cross tenants. Anything in another tenant is meant to represent "internet" and potentially require a router to reach it (should we build out neutron properly, that could even be true).

We currently have a set of hooks that run when a project is created that do things like set up the default security group. We could tweak that to open basic node monitoring to metricsinfra's reserved IP range. Any deeper functionality being opt-in, seems totally reasonable for CloudVPS.

The place to add a hook to include that rule would be "keystonehooks" in puppet. Backfilling things can be scripted if need be.

nskaggs triaged this task as High priority.Aug 10 2021, 4:18 PM
nskaggs lowered the priority of this task from High to Medium.
nskaggs moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

The current monitoring host was given a reserved address in T250206#6056467. Expand that to a larger block of reserved addresses, say a /29 or a /28, (so that we can add redundancy and scale beyond one box) and add a rule that permits traffic from that block to all projects and the defaults for new projects. If it's restricted to the specific node-exporter port, we'd need to tell everyone to manually add rules for scraping non-default targets. This requires a production root to create new Prometheus VMs.

I think this is the realistic path forward. We won't really be able to reserve addresses in a block via the linked method but we can set up stable IPs for the monitoring hosts, at which point we can inject those IPs into existing and future security groups.

@Majavah I realize that much of the above requires openstack admin privs; let me know when you're ready to move and what you need and I can start creating things.