Have PyBal monitor Istio-Ingressgateway health
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	JMeybohm
	Feb 7 2022, 3:27 PM

Description

Right now we run LVS services for istio-ingressgateway with:

monitors:
  IdleConnection:
    max-delay: 300
    timeout-clean-reconnect: 3

This has the downside of PyBal showing all nodes of a cluster where no ingress route/backend is configured as down as ingressgateways envoy will not accept connections (on it's traffic port: tcp/30443) in that case.

In addition this might not catch errors reported by ingressgateway via it's internal health check (tcp/30021). Although it's currently not sure if there are errors that will result in failing health checks while connections are still possible.

Ingressgateway only servers health checks on a different than the traffic port (tcp/30021). So to allow checking those as well, PyBal's ProxyFetch monitor would need to be extended to allow checking a different port. A proposal CR exists at https://gerrit.wikimedia.org/r/c/operations/debs/pybal/+/759749

The above will not help in this particular case.
Kubernetes will internally do health checking on the dedicated health check port (tcp/30021). If that fails it will no longer serve traffic to that ingressgateway instance. In our setup (one ingressgateways per node) this means connections to the ingressgateway traffic port (tcp/30443) as well as to the health check port (tcp/30021) will be dropped by the node (as they are handled the same).
Because of that it seems to be sufficient to just do tcp connection monitoring (for PyBal as well as for monitoring/probes).

Details

	Subject	Repo	Branch	Lines +/-
	Allow to configure a different port for ProxyFetch monitor	operations/debs/pybal	master	+23 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Joe	T252745 Sandbox/limit child processes within a container runtime
Open	None	T261277 Create a gateway in kubernetes for the execution of our "lambdas"
Resolved	JMeybohm	T290966 Implement POC for istio ingress
Declined	None	T301137 Have PyBal monitor Istio-Ingressgateway health