Page MenuHomePhabricator

Install elasticsearch-statsd-plugin to the beta cluster
Closed, ResolvedPublic

Description

We need some good ways to track internal elasticsearch metrics (technical things, like refreshing, highlighting, flushing, etc.) over time. The elasticsearch-statsd-plugin seems like a reasonable starting point, although it looks like it might be abandonware.

  • Last commit to the project was august 2013
  • Automattic forked the project and updated it to work with elasticsearch 1.0. They proposed merging that back into the main repository but did not really follow up. The last commit to the fork is in august 2014
  • There is a patch from chad (demon) dated october 2014 with some minor cleanups, it also hasn't been commented on.

Overall, its not the most promising of software packages but we can probably make it work.

It seems like we basically need to build the automattic version and push it into our archiva, then deploy it to the beta cluster and see how things work out. Also check with @Manybubbles to see if there is a particular reason it used to be installed on beta cluster but is not anymore (broken due to version upgrade?)

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)

Why don't we just fix the ganglia plugin we're already using? Right now Elasticsearch plugins are difficult because any code changes require a rolling restart of the whole elasticsearch cluster. But the ganglia plugin just need bouncing gmond.

All and all I think a monitoring Elasticsearch plugin is a bad idea for us - monitor it in some external application. Once we get faster rolling restarts I won't mind, but that is at least several months away.

The benefit of statsd is that, through graphite, we can run more powerful/complicated queries across the data. For example individual indexes are reported on and you can generate a dashboard that sum's stats across all indexes, or just ones that match a particular globbing pattern. I have to agree though that as long as the 36 hour restart is in place we shouldn't look at this too closely. Adding pausing of indexing as a blocking task.

this should probably be pulled from the sprint, its blocked on search services taking too long to restart. those tasks need to be resolved first.

Change 223202 had a related patch set uploaded (by EBernhardson):
Add statsd reporting plugin

https://gerrit.wikimedia.org/r/223202

Installed to elastic cluster and did a rolling restart. Everything looks to be collecting now, stats are found at

http://graphite.wmflabs.org in the elasticsearch.beta-search namespace.

Example graph