Page MenuHomePhabricator

Fix Parsoid heap usage statistics
Closed, ResolvedPublic

Description

All parsoid workers send their heap usage to the same statsd metric once per 5 minutes. Statsd aggregates its received metrics once every 1 minute, so it basically gets a 20% selection of workers, depending on the time each worker was started, with the selection changing once per minute, leading to oscillation in the final metric with a period of 5 minutes.

This is presumably not the way it should be done. An alternative would be to have workers send performance metrics as messages to the parent, and have the parent aggregate and send them, with one statsd metric name per server. That way, statsd will have a stable per-server metric which can be aggregated in graphite.

While we're at it, it would be nice to have per-server metrics for connection count (server.getConnections()).

Event Timeline

tstarling created this task.Oct 7 2015, 5:48 AM
tstarling raised the priority of this task from to Needs Triage.
tstarling updated the task description. (Show Details)
tstarling added a project: Parsoid.
tstarling added a subscriber: tstarling.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 7 2015, 5:48 AM
ssastry set Security to None.Oct 7 2015, 5:54 AM
ssastry added a subscriber: Arlolra.

Is it done right in service-runner? Maybe another reason to get T90668 done.

Is it done right in service-runner? Maybe another reason to get T90668 done.

There's no heap usage collection in service-runner. All service-runner does is construct a StatsD object and pass it through to the worker. The StatsD object has methods for timing and counting, not for this kind of per-server aggregation.

GWicke added a subscriber: GWicke.EditedOct 8 2015, 2:36 AM

There's no heap usage collection in service-runner.

There is actually built-in heap reporting and -limiting, but the reporting aggregation is per-cluster by default. Implementing per-server aggregation too would be a matter of additionally reporting the heap usage at a statsd key including the host name.

We did not have a use case for per-host heap metrics so far, but would be happy to include such functionality in service-runner.

There's no heap usage collection in service-runner.

There is actually built-in heap reporting and -limiting, but the reporting aggregation is per-cluster by default. Implementing per-server aggregation too would be a matter of additionally reporting the heap usage at a statsd key including the host name.

Why is it in Github? I was looking at a copy in Gerrit which was apparently last edited in February.

Why is it in Github?

In general, the main motivation for using Github is the ability to test against different node versions & services like Cassandra with Travis. See also T78410 for some background on Jenkins vs. Travis for RESTBase.

Here's a dashboard for what service-runner's heapwatch.js is reporting.

https://grafana.wikimedia.org/dashboard/db/parsoid-heap-usage

Arlolra closed this task as Resolved.Mar 30 2017, 1:48 PM
Arlolra claimed this task.