Page MenuHomePhabricator

Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints]
Closed, ResolvedPublic5 Estimated Story Points

Description

Problem statement
Spark on K8 currently does not match the features we have through Spark through YARN. Specifically we might we run into issues with Resource Management, Network/Storage bottlenecks and security management.
Primary Task
Documenting our technical explanations, or discussions about some of the challenges we will run into when changing how we interface with Spark and clarify the tradeoffs we will have to contend with.

Research Areas:

  • Network & Storage Constraints - Decoupling
  • How Scheduling works
  • Security Model

Event Timeline

Should we start with a link dump of sorts, here? Or are you thinking of a more formal document?

It strikes me that there are at least a couple of strands of work, to begin with, such as:

  1. What do we think Spark on Kubernetes might look like at the WMF?
  2. What potential challenges and/or risks might we face if we implement it as currently envisaged?

Once we identify the potential risk areas (the known unknowns), then perhaps we could think about short and long-term approaches to mitigate them.


I can start by sharing some of my current thinking for point 1 above as well, if you like? More than happy to discuss any of your suggestions and concerns areas too.

EChetty renamed this task from Document ideas on "Spark on k8s" to Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1 Sprint].Sep 29 2022, 12:13 PM
EChetty updated the task description. (Show Details)
EChetty set the point value for this task to 5.
EChetty renamed this task from Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1 Sprint] to Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints].Oct 12 2022, 12:46 PM