Page MenuHomePhabricator

Investigate the use of local_quorum for AQS
Closed, ResolvedPublic5 Estimated Story Points

Description

We set LOCAL_ONE for AQS in https://gerrit.wikimedia.org/r/#/c/267924/ as mitigation step to reduce latencies on the old cluster (no SSDs at the time). While this was a good idea at the time, it might be better now to restore LOCAL_QUORUM to leverage the read repairs consistency.

We'll pay some latency of course, but I don't expect it to be that much. Rolling back would be a matter of a puppet deploy in case the performance hit will be unacceptable.

Event Timeline

Nuria subscribed.

We need to do a puppet change, check latencies and rollback/proceed as pertains

Nuria set the point value for this task to 5.

Change 391765 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: restore hyperswitch's consistency to localQuorum

https://gerrit.wikimedia.org/r/391765

Change 391765 merged by Elukey:
[operations/puppet@production] role::aqs: restore hyperswitch's consistency to localQuorum

https://gerrit.wikimedia.org/r/391765

Mentioned in SAL (#wikimedia-operations) [2017-11-16T13:07:45Z] <elukey> restart aqs on aqs100[5-9] to apply localQuorum (https://gerrit.wikimedia.org/r/391765) - T164348