Page MenuHomePhabricator

pybal fails to reconnect cleanly to etcd when etcd is restarted
Open, HighPublic


When etcd gets restarted, pybal will try to connect again to it indicating its waitIndex: - - [13/Dec/2019:09:52:46 +0000] "GET /v2/keys/conftool/v1/pools/esams/cache_text/varnish-fe/?waitIndex=277743&recursive=true&wait=true HTTP/1.0" 400 163 "-" "PyBal/1.6"

Looking at the response:

{"errorCode":401,"message":"The event in requested index is outdated and cleared","cause":"the requested history has been cleared [278460/277892]","index":279459}

So we should just reset waitIndex when the response is a 400.

Event Timeline

Should I merge this into T169765 @Joe as per ema comment?

Should I merge this into T169765 @Joe as per ema comment?

I'd rather do the opposite, given we apparently never release that change.

Sure :-D. Parenting as both have activity- and technically not duplicates, but should be solved at the same time.

jcrespo triaged this task as High priority.Dec 13 2019, 2:31 PM
jcrespo moved this task from Backlog to Acknowledged on the SRE board.
BBlack added a subscriber: BBlack.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!