Codesearch is down
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Daimona
	Feb 22 2020, 4:43 PM

Description

_health says "down" for all entries. It seems like they're being periodically restarted: every few seconds, a couple of entries read "pre-start" and then go down again.

Related Objects

Mentioned In: T246017: CodeSearch Docker containers not starting or failing after start because of iptables network overlay issues

Event Timeline

Daimona created this task.Feb 22 2020, 4:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 22 2020, 4:43 PM

UBN for visibility

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptFeb 22 2020, 4:44 PM

I get this in journal:

Feb 22 16:41:32 codesearch6 docker[11732]: /usr/bin/docker: Error response from daemon: driver failed programming external connectivity on endpoint hound-search (138bd3676b62adf0adf815df25eff3439ef453a24adef983b
Feb 22 16:41:32 codesearch6 docker[11732]:  (exit status 1)).
Feb 22 16:41:32 codesearch6 docker[11732]: time="2020-02-22T16:41:32Z" level=error msg="error waiting for container: context canceled"

Debugging atm

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptFeb 22 2020, 4:49 PM

And this in docker journal:

Feb 22 16:52:37 codesearch6 dockerd[624]: time="2020-02-22T16:52:37.347262611Z" level=warning msg="Failed to allocate and map port 6092-6092:  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6092 -j DNAT --to-destination 172.17.0.2:6080 ! -i docker0: iptables: No chain/target/match by that name.\n (exit status 1))"

Mentioned in SAL (#wikimedia-cloud) [2020-02-22T16:54:57Z] <Amir1> hard reboot of codesearch6 (T245920)

A hard reboot fixed it. Things are coming back online slowly but skins is already up. If it's a recurring issue, it will bring it down again soon, then feel free to re-open it and we investigate it in depth.

Maintenance_bot moved this task from Incoming to Done on the User-Ladsgroup board.Feb 22 2020, 5:15 PM

Thanks @Ladsgroup.

Looking around, it seems like this might just be a general bug in docker that we somehow triggered: https://github.com/moby/moby/issues/16816

bd808 mentioned this in T246017: CodeSearch Docker containers not starting or failing after start because of iptables network overlay issues.Feb 24 2020, 5:41 PM

Codesearch is downClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Codesearch is down
Closed, ResolvedPublic
Actions