Maniphest T204982

Collect per-node latency statistics from each node separately
Open, MediumPublic
Actions

Assigned To

None

Authored By

	EBernhardson
	Sep 20 2018, 5:40 PM

Description

Currently we ask a single node to query all the other nodes and record the latency. This works fine under normal conditions, but under a network partition this request timed out and we didn't collect latency numbers for any of the servers. We already run a prometheus exporter per host, adjust it only query the local node.

Update extra plugin rest api to support collecting latency metrics without any requests over the elasticsearch transport
Deploy updated extra plugin to cluster
Update prometheus exporter to collect per-node

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved	PRODUCTION ERROR	debt	T204776 Investigate brief CirrusSearch outage (MW exception spike for api.php)
		Open		None	T204982 Collect per-node latency statistics from each node separately

Event Timeline

EBernhardson triaged this task as Medium priority.Sep 20 2018, 5:40 PM

EBernhardson created this task.

Gehel edited projects, added Discovery-Search, SRE; removed Discovery-Search (Current work).Jan 30 2019, 7:53 PM

Gehel moved this task from needs triage to Ops / SRE on the Discovery-Search board.

Gehel assigned this task to EBernhardson.Jan 30 2019, 9:56 PM

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

Krinkle unsubscribed.Jul 10 2021, 11:05 AM

jbond edited projects, added Observability-Metrics; removed SRE.May 23 2023, 11:22 AM

I'm removing o11y for this old task, though please do reach out and re-add the tag when assistance is needed!

Collect per-node latency statistics from each node separatelyOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Collect per-node latency statistics from each node separately
Open, MediumPublic
Actions

Related Objects
Search...