Page MenuHomePhabricator

Investigate the use of local_quorum for AQS
Closed, ResolvedPublic5 Estimated Story Points

Description

We set LOCAL_ONE for AQS in https://gerrit.wikimedia.org/r/#/c/267924/ as mitigation step to reduce latencies on the old cluster (no SSDs at the time). While this was a good idea at the time, it might be better now to restore LOCAL_QUORUM to leverage the read repairs consistency.

We'll pay some latency of course, but I don't expect it to be that much. Rolling back would be a matter of a puppet deploy in case the performance hit will be unacceptable.

Event Timeline

elukey created this task.May 3 2017, 8:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2017, 8:56 AM
elukey updated the task description. (Show Details)May 3 2017, 9:00 AM
Nuria added a subscriber: Nuria.

We need to do a puppet change, check latencies and rollback/proceed as pertains

Nuria edited projects, added Analytics-Kanban; removed Analytics.May 4 2017, 4:23 PM
Nuria set the point value for this task to 5.
Nuria edited projects, added Analytics; removed Analytics-Kanban.May 25 2017, 4:11 PM
Nuria moved this task from Operational Excellence Future to Dashiki on the Analytics board.
fdans moved this task from Backlog (Later) to Wikistats on the Analytics board.Oct 2 2017, 4:24 PM

Change 391765 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: restore hyperswitch's consistency to localQuorum

https://gerrit.wikimedia.org/r/391765

Change 391765 merged by Elukey:
[operations/puppet@production] role::aqs: restore hyperswitch's consistency to localQuorum

https://gerrit.wikimedia.org/r/391765

Mentioned in SAL (#wikimedia-operations) [2017-11-16T09:44:42Z] <elukey> restart aqs on aqs1004 to apply localQuorum (https://gerrit.wikimedia.org/r/391765) - T164348

elukey edited projects, added Analytics-Kanban; removed Analytics.Nov 16 2017, 10:01 AM
elukey moved this task from Next Up to Ready to Deploy on the Analytics-Kanban board.

Mentioned in SAL (#wikimedia-operations) [2017-11-16T13:07:45Z] <elukey> restart aqs on aqs100[5-9] to apply localQuorum (https://gerrit.wikimedia.org/r/391765) - T164348

elukey moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Nov 16 2017, 2:05 PM
Nuria closed this task as Resolved.Nov 27 2017, 9:28 PM