Page MenuHomePhabricator

openstack: increase virtual network observability
Open, MediumPublic

Description

We would like to increase the observability of the openstack virtual network.

Some ideas:

  • see if openvswitch exports prometheus metrics we can scrape
  • network crosscheck daemon reporting via prometheus
    • running on every cloudvirt, ping other cloudvirts on the virtual network
  • DNS recursive request error monitoring, report via prometheus
    • from a virtual machine on every cloudvirt

See also T380892: [infra,k8s,o11y] introduce additional observability for calico and general networking

Event Timeline

aborrero moved this task from Backlog to Blocked/waiting on the User-aborrero board.

running on every toolforge kubernetes worker node, ping other workers on the pod network, and coredns

The failures I see in Tool-gitlab-account-approval and Wikibugs processes are generally in network connections that cross out of the Pod network. I would be interested in seeing checks for connectivity to ldap-ro.eqiad.wikimedia.org, gitlab.wikimedia.org, gerrit.wikimedia.org, phabricator.wikimedia.org, and any randomly chosen wiki. Checking connectivity to frequently used external services such as irc.libera.chat, github.com, packagist.org, pypi.org, and npmjs.com would be nice to haves as well.

running on every toolforge kubernetes worker node, ping other workers on the pod network, and coredns

The failures I see in Tool-gitlab-account-approval and Wikibugs processes are generally in network connections that cross out of the Pod network. I would be interested in seeing checks for connectivity to ldap-ro.eqiad.wikimedia.org, gitlab.wikimedia.org, gerrit.wikimedia.org, phabricator.wikimedia.org, and any randomly chosen wiki. Checking connectivity to frequently used external services such as irc.libera.chat, github.com, packagist.org, pypi.org, and npmjs.com would be nice to haves as well.

thanks, I recorded this in T380892: [infra,k8s,o11y] introduce additional observability for calico and general networking