Page MenuHomePhabricator

(nodetool) cleanup needed on restbase1006
Closed, ResolvedPublic

Description

Bootstrapping a new node into the cluster causes some portion of the dataset to be relocated to the joining node. Once complete, this data remains in place, though unreachable, on the source nodes, until it is gradually discarded by routine compaction, or explicitly purged by performing a cleanup.

Normally the number of nodes potentially requiring cleanup would be large (possibly all of them), and as result, the portion of data from each correspondingly small. However, as a result of the fact that we have 3 replicas spread across 3 racks, and that restbase1006 was recently bootstrapped into a rack with only one machine (restbase1005), all of the data now on restbase1006 exclusively came from restbase1005. You can see this by comparing the load values:

Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  10.64.48.99   547.26 GB  256     ?       325e01e8-debe-45f0-a8c2-93b3baa58968  d
UN  10.64.32.159  267.42 GB  256     ?       88d9ef9f-d81b-466e-babf-6a283b13f648  b
UN  10.64.0.221   286.62 GB  256     ?       fc041cc8-cd28-4030-b29a-05b9a632cafc  a
UN  10.64.48.100  281.9 GB   256     ?       2abf437d-a16d-406b-a6de-8d28b7dda808  d
UN  10.64.0.220   273.84 GB  256     ?       c021a198-b7f1-4dc2-94d7-9cb8b8a8df28  a
UN  10.64.32.160  292.82 GB  256     ?       798ff758-8c91-46e0-b85e-dad356c46f20  b

In addition to obscuring the actual disk usage, this unreachable data also results in less optimal read performance, (the page cache won't go as far, for example).

TL;DR

We should run nodetool cleanup on restbase1005. I expect this to take some time, and generate some additional disk IO. Given the relatively low load on these hosts, I don't believe there is any danger of impacting the services, but out of an abundance of caution, we might do this at a less-than-peak time.

Event Timeline

Eevans claimed this task.
Eevans raised the priority of this task from to Medium.
Eevans updated the task description. (Show Details)
Eevans added a project: RESTBase.
Eevans added subscribers: Eevans, fgiunchedi.

LGTM, are there mechanisms to alert us when a manual cleanup is needed? or perhaps how much data is pending cleanup so we can track it?

LGTM, are there mechanisms to alert us when a manual cleanup is needed? or perhaps how much data is pending cleanup so we can track it?

Not really, no. It is limited to range movement though, so it takes some administrative event like this (adding a new node), for it to become necessary.

And cleanup is something of a soft requirement, one that might require a bit of case-by-case judgment. For example, the more nodes that are involved in a range movement, the less data per node is affected, and so the less (relative) impact. You might easily decide that you have the overhead to ride it out. In this case though, the topology conspired to create a scenario where a significant fraction of the dataset moved from a single node, so it seems warranted.

I've started a nodetool cleanup and will monitor its progress.

A late update: This cleanup was aborted after it failed to make any headway, and T93140 was discovered. We can try again once compaction has caught up.

Aklapper renamed this task from (nodetool )cleanup needed on restbase1006 to (nodetool) cleanup needed on restbase1006.Mar 20 2015, 2:40 PM
GWicke subscribed.

I ran this cleanup successfully over the weekend. Resolving.