Page MenuHomePhabricator

Collect per-node latency statistics from each node separately
Open, MediumPublic

Description

Currently we ask a single node to query all the other nodes and record the latency. This works fine under normal conditions, but under a network partition this request timed out and we didn't collect latency numbers for any of the servers. We already run a prometheus exporter per host, adjust it only query the local node.

  • Update extra plugin rest api to support collecting latency metrics without any requests over the elasticsearch transport
  • Deploy updated extra plugin to cluster
  • Update prometheus exporter to collect per-node

Event Timeline

EBernhardson created this task.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

fgiunchedi subscribed.

I'm removing o11y for this old task, though please do reach out and re-add the tag when assistance is needed!