Page MenuHomePhabricator

pybal fails to reconnect cleanly to etcd when etcd is restarted
Open, HighPublic

Description

When etcd gets restarted, pybal will try to connect again to it indicating its waitIndex:

10.20.0.17 - - [13/Dec/2019:09:52:46 +0000] "GET /v2/keys/conftool/v1/pools/esams/cache_text/varnish-fe/?waitIndex=277743&recursive=true&wait=true HTTP/1.0" 400 163 "-" "PyBal/1.6"

Looking at the response:

{"errorCode":401,"message":"The event in requested index is outdated and cleared","cause":"the requested history has been cleared [278460/277892]","index":279459}

So we should just reset waitIndex when the response is a 400.

Event Timeline

Should I merge this into T169765 @Joe as per ema comment?

Should I merge this into T169765 @Joe as per ema comment?

I'd rather do the opposite, given we apparently never release that change.

Sure :-D. Parenting as both have activity- and technically not duplicates, but should be solved at the same time.

jcrespo triaged this task as High priority.Dec 13 2019, 2:31 PM
jcrespo moved this task from Backlog to Acknowledged on the SRE board.
BBlack added a subscriber: BBlack.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!