_health says "down" for all entries. It seems like they're being periodically restarted: every few seconds, a couple of entries read "pre-start" and then go down again.
Description
Description
Related Objects
Related Objects
Event Timeline
Comment Actions
I get this in journal:
Feb 22 16:41:32 codesearch6 docker[11732]: /usr/bin/docker: Error response from daemon: driver failed programming external connectivity on endpoint hound-search (138bd3676b62adf0adf815df25eff3439ef453a24adef983b Feb 22 16:41:32 codesearch6 docker[11732]: (exit status 1)). Feb 22 16:41:32 codesearch6 docker[11732]: time="2020-02-22T16:41:32Z" level=error msg="error waiting for container: context canceled"
Debugging atm
Comment Actions
And this in docker journal:
Feb 22 16:52:37 codesearch6 dockerd[624]: time="2020-02-22T16:52:37.347262611Z" level=warning msg="Failed to allocate and map port 6092-6092: (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6092 -j DNAT --to-destination 172.17.0.2:6080 ! -i docker0: iptables: No chain/target/match by that name.\n (exit status 1))"
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2020-02-22T16:54:57Z] <Amir1> hard reboot of codesearch6 (T245920)
Comment Actions
A hard reboot fixed it. Things are coming back online slowly but skins is already up. If it's a recurring issue, it will bring it down again soon, then feel free to re-open it and we investigate it in depth.
Comment Actions
Thanks @Ladsgroup.
Looking around, it seems like this might just be a general bug in docker that we somehow triggered: https://github.com/moby/moby/issues/16816