Maniphest T320222

Simplify network stack
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	rook
	Oct 7 2022, 5:52 AM

Description

PAWS currently has a network entry that looks like:
paws.wmcloud.org -> 185.15.56.57 -> paws-k8s-haproxy-[12] -> paws-k8s-ingress-[34] OR paws-prometheus-1 -> k8s cluster, typically a worker

T319366 gives us some reasons to rethink the parts that are not in the k8s cluster themselves. At first glance both the haproxy and ingress nodes are redundant to the cluster itself, as, ultimately, the cluster will handle redirects through its internal networking. It would appear that if we were to absorb prometheus into the cluster itself that both the ingress and the haproxy systems could be dropped.

Related Objects

Mentioned Here: T319366: PawsJupyterHubDown alert flapping

Event Timeline

rook created this task.Oct 7 2022, 5:52 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 7 2022, 5:52 AM

rook added a project: PAWS.Oct 7 2022, 5:54 AM

rook updated the task description. (Show Details)

Hmm. Octavia (OpenStack's native load balancing system that we don't currently have installed) uses HAProxy internally, so I'm not sure if this is worth spending much time at this moment. I think not counting the recent NFS config issue (which wasn't really HAProxy specific), the HAProxy+Keepalived has been a really stable solution for doing load balancing on toolforge and paws.

I suspect I'm missing details on what is happening. keepalived requires puppet to function, https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Keepalived ?
I would probably suggest that it feels awkward to require puppet knowledge and use to be able to operate our equivalent of load balancing.
The other detail that comes to mind is if we route everything through one node, with live failover, we could overload that node in a way that a loadbalanced application wouldn't be overloaded. This might be more theoretical, as I don't know that any of the cloudVPS projects see levels of traffic where this may be a problem.

rook closed this task as a duplicate of T328842: Restructure paws away from special networking.Apr 14 2023, 2:16 PM

Simplify network stackClosed, DuplicatePublicActions

Description

Related Objects

Event Timeline

Simplify network stack
Closed, DuplicatePublic
Actions