UPDATE: Research phase complete. This ticket has been closed. The conversation will continue on Media Wiki. Click here to access the discussion page for this topic.
To bridge the gap between dev and prod environments we would like to run jobs on k8s.
Our use case is described Use case: compute needs for streaming pipelines.
The goal of this Spike is to determine if local or WMF Cloud based k8s instances can be suitable environments for learning, experimentation and development.
We would like to collect info to make an informed decision about the following:
- do we want to invest resources developing k8s capabilities for development productivity and testing?
- do we want to invest resources improving our release and deployment cycles targeting yarn?
The two are not mutually exclusive. Discarding this work for now is ok too.
Success criteria
Mediawiki Stream Enrichment can run on k8s (minikube) consuming synthetic data.
References:
- https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Flink_On_Kubernetes
- https://wikitech.wikimedia.org/wiki/Kubernetes/Kubernetes_Workshop
- https://docs.google.com/presentation/d/1dWJVhAuWpNY3jAmiP9tizegI_Fk1hDzyoNutwTsdlrA/edit#slide=id.g10396781996_0_14
- https://flink.apache.org/2021/02/10/native-k8s-with-ha.html
- https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
- https://github.com/spotify/flink-on-k8s-operator