Different Kubernetes Deployment Modes Docs -> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/kubernetes.html#deployment-modes
Currently the plan is deploy Flink using the Deploy Application Cluster which runs a single application. This means the streaming updater jar is bundled in the Flink image and that stopping the streaming updater job in Flink stops the entire Flink application. This makes it difficult for the Flink job to be stopped with a savepoint, as the Flink application cannot stay alive without the running Streaming Updater job.
It was suggested to use the Session Cluster, which would allow us to stop the job cleanly without stopping the entire Flink Application. This would allow any jobs to be stopped with savepoints (via the API or UI) and also started again from savepoints (again via API or UI). This would also allow multiple jobs to uploaded to one session cluster so that the WCQS (commons query service) updater could be run on the same Flink session cluster. Jars are uploaded to the session cluster, and in Flink HA mode, they are saved in Swift. In the event that the Session cluster is shut down by SRE, any running jobs will be still resumed using the state stored in the HA configmaps.
- documentation on what Session mode is (high level documentation, use case, links to Flink official doc)
- common understanding between SRE and Search Platform about using Session mode
- configuration change to enable session mode