Page MenuHomePhabricator

etcd cluster has Raft Internal errors sporadically
Open, MediumPublic

Description

We've been getting these errors on and off for a while when executing changes to the pool state of etcd objects via confctl...

ERROR:conftool:Error when trying to set/pooled=no on name=cp1064.eqiad.wmnet,service=varnish-be-rand
ERROR:conftool:Failure writing to the kvstore: Backend error: Raft Internal Error : etcdserver: request timed out, possibly due to previous leader failure

Event Timeline

BBlack created this task.Oct 3 2016, 5:09 PM
Restricted Application added a project: Operations. · View Herald TranscriptOct 3 2016, 5:09 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
elukey added a subscriber: elukey.Oct 3 2016, 5:09 PM
ema moved this task from Triage to etcd on the Traffic board.Oct 4 2016, 11:27 AM

Mentioned in SAL (#wikimedia-operations) [2016-10-05T09:11:46Z] <ema> repooling varnish-be-rand on cp2014 and cp1073 T147209