Page MenuHomePhabricator

Simplify network stack
Closed, DuplicatePublic

Description

PAWS currently has a network entry that looks like:
paws.wmcloud.org -> 185.15.56.57 -> paws-k8s-haproxy-[12] -> paws-k8s-ingress-[34] OR paws-prometheus-1 -> k8s cluster, typically a worker

T319366 gives us some reasons to rethink the parts that are not in the k8s cluster themselves. At first glance both the haproxy and ingress nodes are redundant to the cluster itself, as, ultimately, the cluster will handle redirects through its internal networking. It would appear that if we were to absorb prometheus into the cluster itself that both the ingress and the haproxy systems could be dropped.

Event Timeline

rook updated the task description. (Show Details)

Hmm. Octavia (OpenStack's native load balancing system that we don't currently have installed) uses HAProxy internally, so I'm not sure if this is worth spending much time at this moment. I think not counting the recent NFS config issue (which wasn't really HAProxy specific), the HAProxy+Keepalived has been a really stable solution for doing load balancing on toolforge and paws.

I suspect I'm missing details on what is happening. keepalived requires puppet to function, https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Keepalived ?
I would probably suggest that it feels awkward to require puppet knowledge and use to be able to operate our equivalent of load balancing.
The other detail that comes to mind is if we route everything through one node, with live failover, we could overload that node in a way that a loadbalanced application wouldn't be overloaded. This might be more theoretical, as I don't know that any of the cloudVPS projects see levels of traffic where this may be a problem.