Page MenuHomePhabricator

Create a helm chart for the spark-history service
Closed, ResolvedPublic

Description

We need a new chart in operations/deployment-charts for deploying the Spark History Service to Kubernetes.

There is some background information on our use of helm charts here: https://wikitech.wikimedia.org/wiki/Kubernetes/Deployment_Charts

I believe that the README file here is up-to-date in terms of adding a new service:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/%2B/refs/heads/master/README.md

We will need to supply a few configuration values and, crucially, a kerberos keytab, so that this service can read from and write to HDFS.

We will be using the spark images here: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/896363 and running it as a daemon using the history option to the entrypoint.sh script.

Event Timeline

BTullis triaged this task as High priority.

Change 977994 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/docker-images/production-images@master] Mention the fact that 'history' is a valid argument in the error message

https://gerrit.wikimedia.org/r/977994

Change 978629 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Define the spark-history chart

https://gerrit.wikimedia.org/r/978629

Change 977994 merged by Brouberol:

[operations/docker-images/production-images@master] Mention the fact that 'history' is a valid argument in the error message

https://gerrit.wikimedia.org/r/977994

The helm chart is _mostly_ done. @BTullis @Antoine_Quhen whenever possible and convenient, could I get your opinion on the general design and the default values being used? A lot of hadoop/hdfs config (ultimately rendered into core-site.xml and hdfs-site.xml) were cargo-culted and many might just be irrelevant. Thanks!

Mentioned in SAL (#wikimedia-operations) [2023-12-12T13:45:33Z] <brouberol> increasing max container memory requests in dse-k8s from 3GB to 8GB - T351722

Change 982762 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] dse-k8s limitrange: ensure pod max memory is higher than container max memory

https://gerrit.wikimedia.org/r/982762

Change 982762 merged by Brouberol:

[operations/deployment-charts@master] dse-k8s limitrange: ensure pod max memory is higher than container max memory

https://gerrit.wikimedia.org/r/982762

Mentioned in SAL (#wikimedia-operations) [2023-12-13T09:24:59Z] <brouberol> increasing pod max requested memory to a higher value than the container max requested memory for dse-k8s-eqiad - T351722

Change 978629 merged by Brouberol:

[operations/deployment-charts@master] Define the spark-history chart

https://gerrit.wikimedia.org/r/978629

Change 983422 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] spark-history: Add missing service template include

https://gerrit.wikimedia.org/r/983422

Change 983422 merged by Brouberol:

[operations/deployment-charts@master] spark-history: Add missing service template include

https://gerrit.wikimedia.org/r/983422