Page MenuHomePhabricator

Consider running presto with disaggregated coordinators to facilitate routine maintenance
Open, HighPublic

Description

We currently run presto in a mode whereby its coordinator role is a single-point-of-failure.
Although we run two instances of the presto coordinator process, each of them is unaware of the other and believes that it alone knows the true state of the presto cluster.

All worker nodes register to a single coordinator (or discovery server) which we set to be analytics-presto.eqiad.wmnet
This is in fact a DNS CNAME that points to either an-coord1003 or an-coord1004.

When we wish to take down the active coordinator for maintenance, what we have to do is to change the DNS alias and then issue a full cluster restart in order to force the workers to re-register with the replacement coordinator. This causes downtime for the cluster.

A more sophisticated configuration is to use disaggregated coordinators, which share a common view of the cluster and any of them may be used.

However, deploying this configuration requires the use of a new presto component called the resource manager.
We have not yet decided how and where these resource manager instances should run.

Acceptance criteria

  • Evaluate whether or not the disaggregated coordinator setup is likely to be valuable for us