Page MenuHomePhabricator

[SPIKE] Assess what is required for the enrichment pipeline to run on k8s
Closed, ResolvedPublicSpike

Description

UPDATE: Research phase complete. This ticket has been closed. The conversation will continue on Media Wiki. Click here to access the discussion page for this topic.

To bridge the gap between dev and prod environments we would like to run jobs on k8s.

Our use case is described Use case: compute needs for streaming pipelines.

The goal of this Spike is to determine if local or WMF Cloud based k8s instances can be suitable environments for learning, experimentation and development.
We would like to collect info to make an informed decision about the following:

  • do we want to invest resources developing k8s capabilities for development productivity and testing?
  • do we want to invest resources improving our release and deployment cycles targeting yarn?

The two are not mutually exclusive. Discarding this work for now is ok too.

Success criteria

Mediawiki Stream Enrichment can run on k8s (minikube) consuming synthetic data.

References:

Event Timeline

gmodena renamed this task from [SPIKE][NEEDS GROOMING] Flink enrichment pipline should run on k8 to [SPIKE][NEEDS GROOMING] Assess what is required for the enrichment pipline to run on k8.Aug 17 2022, 3:09 PM
gmodena moved this task from Next Up to In Progress on the Event-Platform (Sprint 00) board.
gmodena added a project: Spike.
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptAug 17 2022, 3:10 PM
gmodena renamed this task from [SPIKE][NEEDS GROOMING] Assess what is required for the enrichment pipline to run on k8 to [SPIKE] Assess what is required for the enrichment pipline to run on k8.Aug 29 2022, 12:06 PM

Spike summary

I explored with adjusting the k8 workshop to Apache Flink. It boils down to running Flink on minikube. This can be done locally, without the need of a cloud vps vm.

Following are some consideration to bring into the next grooming seession.

I'd say that Could VPS would not buy us much, other than _potentially_ granting multi users access to a self-hosted minikube - or expose a public facing service. I don't think we want to go down the path of maintaining either (for dev workflows).

Setting up minikube is a well documented and straightforward process (at least on macOS/linux).
For running Flink on k8, I explored two paths:

  1. Adjusting the Search flink-session-cluster helm charts.
  2. Using the recently release Apache Flink Kubernetes Operator.

While for production use cases we should clearly adopt 1), both approaches offer interesting angles for experimentation and local development.

Path 1) requires a Docker image and decoupling the charts from the specific use case and WMF envs (https://github.com/wikimedia/operations-deployment-charts/blob/master/charts/flink-session-cluster/values.yaml). We should consider contributing to a generic enough config, and make the setup more self service for developers (that want to run things on minikube).

Path 2) was easier to setup "out of the box". Setting up Cluster deployments that can accept Job submission either interactively or programmatically is well documented https://github.com/apache/flink-kubernetes-operator/tree/main/examples. The tutorial at
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/ gives the basic building blocks for setting up a Flink Cluster ready to accept jobs.

gmodena renamed this task from [SPIKE] Assess what is required for the enrichment pipline to run on k8 to [SPIKE] Assess what is required for the enrichment pipeline to run on k8.Aug 29 2022, 2:27 PM

@gmodena thanks for exploring these k8s deployment options!
Something I used to test H/A capabilities (restarts&recovery) was https://min.io/ with minikube, I might still have some config examples and I remember it was not quite trivial to setup, but most probably because of my lack of knowledge of k8s.
Making the current flink-session-cluster helm chart more generic is definitely something that sounds valuable in the short/mid-term.
For the long term I wish we can explore using the apache flink-kubernetes-operator in production, the hope is that it could solve some the pain points we have regarding k8s and job management.

akosiaris added a subscriber: JMeybohm.
akosiaris subscribed.
akosiaris renamed this task from [SPIKE] Assess what is required for the enrichment pipeline to run on k8 to [SPIKE] Assess what is required for the enrichment pipeline to run on k8s.Sep 8 2022, 8:18 AM
JArguello-WMF updated the task description. (Show Details)
JArguello-WMF updated the task description. (Show Details)