Page MenuHomePhabricator

Deploy Flink (rdf-streaming-updater) to kubernetes (k8s)
Closed, ResolvedPublic21 Estimated Story Points

Description

As on operator of WDQS / WCQS, I want to deploy Flink in a way that is robust so that we can deploy WDQS Streaming updater on top of it.

The strategy as discussed in part in T247058 is to use k8s to provide compute resources and Swift as storage. Communication with the teams in charge of k8s and Flink has already started, but our needs have to be formalized, and the concrete strategy for implementation needs to be defined.

The wikidata query service streaming updater is currently deployed to Yarn and is in beta. Current production plans are for the updater to be deployed in the Kubernetes cluster to the staging cluster and to the Eqiad cluster. No plans right now for multi-cluster deployment.

AC:

  • Flink is deployed on a production k8s cluster
  • stream of TTL update is available in Kafka and ready to be consumed

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
CBogen set the point value for this task to 21.Oct 5 2020, 5:09 PM

We're having an in person meeting soon, but I just wanted to outline some things I think will need to happen from a high level. This is based on reading https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Components.

  1. Create blubberfiles for the job manager
  2. Set up deployment pipeline with help from release engineering to accomodate for Java
  3. Create Helm charts

Other concerns include logging and monitoring

Gehel triaged this task as High priority.Oct 28 2020, 1:29 PM

After the helm chart is merged and published (both should happen automatically on a +2, I 've +1ed already), the final 2 items for deployment are:

Change 650309 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/flink-rdf-streaming-updater@master] add helm test and helm chart

https://gerrit.wikimedia.org/r/650309

Change 654723 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[operations/deployment-charts@master] update flink logging

https://gerrit.wikimedia.org/r/654723

Change 654723 merged by jenkins-bot:
[operations/deployment-charts@master] update flink logging

https://gerrit.wikimedia.org/r/654723

Change 666713 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/rdf@master] update flink version to 1.12.1

https://gerrit.wikimedia.org/r/666713

Change 666713 merged by jenkins-bot:
[wikidata/query/rdf@master] update flink version to 1.12.1

https://gerrit.wikimedia.org/r/666713

Change 650309 merged by jenkins-bot:
[wikidata/query/flink-rdf-streaming-updater@master] add pipeline test and chart and update to Flink 1.12.1

https://gerrit.wikimedia.org/r/650309

Change 671204 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[operations/deployment-charts@master] create helmfile.d structure

https://gerrit.wikimedia.org/r/671204

Gehel closed subtask Restricted Task as Resolved.
jijiki renamed this task from Deploy Flink to kubernetes (k8s) to Deploy Flink (rdf-streaming-updater) to kubernetes (k8s).Jun 22 2021, 9:30 AM
jijiki added a project: serviceops.
jijiki updated the task description. (Show Details)
jijiki added subscribers: Zbyszko, dcausse.

@Zbyszko, @dcausse, we will try to clear any leftovers this week so you can do a first deployment to staging. Since next week we will be switchover datacenrtres, we can't deploy flink to production clusters anhyway, until it is complete and every is ok

Change 671204 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: switch to H/A session-cluster

https://gerrit.wikimedia.org/r/671204

Change 704535 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] rdf-streaming-updater: add namespace for service

https://gerrit.wikimedia.org/r/704535

Change 704326 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng: Add a new tiller ClusterRole for flink-session-cluster

https://gerrit.wikimedia.org/r/704326

Change 704537 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] Add rdf-streaming-updater user

https://gerrit.wikimedia.org/r/704537

Change 704538 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[labs/private@master] Add tokens for rdf-streaming-updater service

https://gerrit.wikimedia.org/r/704538

Change 704538 merged by Effie Mouzeli:

[labs/private@master] rdf-streaming-updater: Add tokens for service

https://gerrit.wikimedia.org/r/704538

Change 704537 merged by Effie Mouzeli:

[operations/puppet@production] Add rdf-streaming-updater kubernetes user

https://gerrit.wikimedia.org/r/704537

Change 704326 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Add a new tiller ClusterRole for flink-session-cluster

https://gerrit.wikimedia.org/r/704326

Change 704535 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: add namespace for service

https://gerrit.wikimedia.org/r/704535

Change 704757 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng: Fix name of rbac apiGroup in tiller-flink clusterrole

https://gerrit.wikimedia.org/r/704757

Change 704757 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Fix name of rbac apiGroup in tiller-flink clusterrole

https://gerrit.wikimedia.org/r/704757

Change 705633 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/flink-rdf-streaming-updater@master] Add wmf-certificates

https://gerrit.wikimedia.org/r/705633

Change 705671 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: Do not use envoy for thanos-swift

https://gerrit.wikimedia.org/r/705671

Change 705633 merged by jenkins-bot:

[wikidata/query/flink-rdf-streaming-updater@master] Switch to buster (java 11) and add wmf-certificates

https://gerrit.wikimedia.org/r/705633

Change 705671 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: Do not use envoy for thanos-swift

https://gerrit.wikimedia.org/r/705671

Change 705719 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: use image version 2021-07-20-143040-production

https://gerrit.wikimedia.org/r/705719

Change 705719 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: use image version 2021-07-20-143040-production

https://gerrit.wikimedia.org/r/705719

Change 708098 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/flink-rdf-streaming-updater@master] properly install wmf-certificates on the prod image

https://gerrit.wikimedia.org/r/708098

Change 708098 merged by jenkins-bot:

[wikidata/query/flink-rdf-streaming-updater@master] properly install wmf-certificates on the prod image

https://gerrit.wikimedia.org/r/708098

Change 708111 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: use image version 2021-07-26-125114-production

https://gerrit.wikimedia.org/r/708111

Change 708111 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: use image version 2021-07-26-125114-production

https://gerrit.wikimedia.org/r/708111

Change 708528 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] flink-session-cluster: fix main_app app label...

https://gerrit.wikimedia.org/r/708528

Change 708528 merged by jenkins-bot:

[operations/deployment-charts@master] flink-session-cluster: fix main_app app label...

https://gerrit.wikimedia.org/r/708528

Gehel claimed this task.
Gehel closed subtask T273098: High Availability Flink as Resolved.