Page MenuHomePhabricator

Enable egress traffic from spark pods to HDFS and HIVE
Closed, ResolvedPublic1 Estimated Story Points

Description

Enable egress traffic from spark pods to HDFS and HIVE within K8S

It supports user story 1

Done is

  • egress rules are configured for spark pods to hdfs and hive
  • a spark test jobs reading on hdfs works fine

Event Timeline

Change 899630 had a related patch set uploaded (by Nicolas Fraison; author: Nicolas Fraison):

[operations/deployment-charts@master] spark: Allow communication from spark pods to HDFS/Hive

https://gerrit.wikimedia.org/r/899630

Change 899630 merged by Nicolas Fraison:

[operations/deployment-charts@master] spark: Allow communication from spark pods to HDFS/Hive

https://gerrit.wikimedia.org/r/899630

Change 901561 had a related patch set uploaded (by Nicolas Fraison; author: Nicolas Fraison):

[operations/puppet@production] hadoop: Authorize access from dse k8s pods to hdfs and hive-metastore

https://gerrit.wikimedia.org/r/901561

Change 901562 had a related patch set uploaded (by Nicolas Fraison; author: Nicolas Fraison):

[operations/puppet@production] hadoop: Authorize access from dse k8s pods to hdfs and hive-metastore prod

https://gerrit.wikimedia.org/r/901562

Change 901561 merged by Nicolas Fraison:

[operations/puppet@production] hadoop: Authorize access from dse k8s pods to hdfs and hive-metastore test

https://gerrit.wikimedia.org/r/901561

FW access open. Access OK with below job config

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hadoop-conf
  namespace: spark
data:
  core-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://analytics-hadoop/</value>
      </property>
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
      </property>
      <property>
        <!-- Emptier interval specifies how long (in minutes) the NameNode waits
             before running a thread to manage checkpoints. -->
        <name>fs.trash.checkpoint.interval</name>
        <value>1440</value>
      </property>
      <!-- Deletion interval specifies how long (in minutes) a checkpoint
           will be expired before it is deleted. -->
      <property>
        <name>fs.trash.interval</name>
        <value>43200</value>
      </property>
      <!-- Script used to map nodes to rack or rows in datacenter. -->
      <property>
          <name>net.topology.script.file.name</name>
          <value>/etc/hadoop/conf.analytics-hadoop/net-topology.sh</value>
      </property>
      <property>
          <name>fs.permissions.umask-mode</name>
          <value>027</value>
      </property>
      <property>
          <name>hadoop.http.staticuser.user</name>
          <value>yarn</value>
      </property>
      <property>
          <name>hadoop.rpc.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>hadoop.security.authentication</name>
          <value>kerberos</value>
      </property>
      <property>
          <name>hadoop.ssl.enabled.protocols</name>
          <value>TLSv1.2</value>
      </property>
    </configuration>
  hdfs-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>dfs.nameservices</name>
        <value>analytics-hadoop</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.analytics-hadoop</name>
        <value>an-test-master1001-eqiad-wmnet,an-test-master1002-eqiad-wmnet</value>
      </property>
        <property>
        <name>dfs.namenode.servicerpc-address.analytics-hadoop.an-test-master1001-eqiad-wmnet</name>
        <value>an-test-master1001.eqiad.wmnet:8040</value>
      </property>
        <property>
        <name>dfs.namenode.servicerpc-address.analytics-hadoop.an-test-master1002-eqiad-wmnet</name>
        <value>an-test-master1002.eqiad.wmnet:8040</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-hadoop.an-test-master1001-eqiad-wmnet</name>
        <value>an-test-master1001.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-hadoop.an-test-master1002-eqiad-wmnet</name>
        <value>an-test-master1002.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.analytics-hadoop.an-test-master1001-eqiad-wmnet</name>
        <value>an-test-master1001.eqiad.wmnet:50070</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.analytics-hadoop.an-test-master1002-eqiad-wmnet</name>
        <value>an-test-master1002.eqiad.wmnet:50070</value>
      </property>
      <property>
        <name>dfs.client.failover.proxy.provider.analytics-hadoop</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
       <!--
        Deprecated in CDH5. Replaced by dfs.blocksize.
        We keep it around for a bit nonetheless, in case some application
        still try to access it directly.
       -->
       <name>dfs.block.size</name>
       <value>268435456</value>
      </property>
      <property>
        <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
        <value>true</value>
      </property>
      <property>
          <name>dfs.block.access.token.enable</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.cluster.administrators</name>
          <value>hdfs analytics-admins,ops</value>
      </property>
      <property>
          <name>dfs.data.transfer.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>dfs.datanode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.key.bitlength</name>
          <value>128</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.suites</name>
          <value>AES/CTR/NoPadding</value>
      </property>
      <property>
          <name>dfs.http.policy</name>
          <value>HTTPS_ONLY</value>
      </property>
      <property>
          <name>dfs.namenode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.web.authentication.kerberos.principal</name>
          <value>HTTP/_HOST@WIKIMEDIA</value>
      </property>
    </configuration>

---
apiVersion: v1
kind: Secret
metadata:
  name: hdfs-token-wmf
  namespace: spark
type: HadoopDelegationToken
data:
  hadoop.token: A_TOKEN
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-wmf
  namespace: spark
spec:
  type: Scala
  mode: cluster
  image: "docker-registry.wikimedia.org/spark:3.3.0-2"
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.HdfsTest
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar"
  arguments: ["hdfs:///user/nfraison/test"]
  sparkVersion: "3.3.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "hadoop"
      configMap:
        name: hadoop-conf
  sparkConf:
    spark.driver.port: "12000"
    spark.driver.blockManager.port: "13000"
    spark.ui.port: "4045"
  driver:
    cores: 1
    coreLimit: "1"
    memory: "512m"
    labels:
      version: "3.3.0"
    serviceAccount: spark-driver
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"
  executor:
    cores: 1
    coreLimit: "1"
    instances: 5
    memory: "512m"
    labels:
      version: "3.3.0"
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"

But now facing this issue on spark driver

23/03/21 15:57:59 INFO RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/an-test-master1001.eqiad.wmnet@ATHENA.MIT.EDU, expecting: hdfs/an-test-master1001.eqiad.wmnet@WIKIMEDIA; Host Details : local host is: "spark-pi-wmf-driver/10.67.26.86"; destination host is: "an-test-master1001.eqiad.wmnet":8020; , while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-test-master1001.eqiad.wmnet/10.64.5.39:8020 after 8 failover attempts. Trying to failover after sleeping for 16700ms. Current retry count: 8.

This is really strange as it should only be relying on delegation token and completely don't care on kerberos

Also seems that dse-k8s-worker1002.eqiad.wmnet as a strange pattern the driver stuck at

log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Setting spark.hadoop.yarn.resourcemanager.principal to spark

There are one TIME_WAIT socket to NN port 8020

while on dse-k8s-worker1006.eqiad.wmnet it well goes forward

23/03/21 17:25:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting spark.hadoop.yarn.resourcemanager.principal to spark
23/03/21 17:25:10 INFO SparkContext: Running Spark version 3.3.0
23/03/21 17:25:10 INFO ResourceUtils: ==============================================================
23/03/21 17:25:10 INFO ResourceUtils: No custom resources configured for spark.driver.
23/03/21 17:25:10 INFO ResourceUtils: ==============================================================
23/03/21 17:25:10 INFO SparkContext: Submitted application: HdfsTest
23/03/21 17:25:11 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)

Some FW rules that are not applied by calico?

For the kerb issue it is due to my config not being fine. I'm using a hadooop delegation token for hdfs analytics-test-hadoop while my config was using analytics-hadoop. So it leads to not finding the HDT and fall back to kerberos...
Below config is working fine in test cluster:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hadoop-conf
  namespace: spark
data:
  core-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://analytics-test-hadoop/</value>
      </property>
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
      </property>
      <property>
          <name>fs.permissions.umask-mode</name>
          <value>027</value>
      </property>
      <property>
          <name>hadoop.http.staticuser.user</name>
          <value>yarn</value>
      </property>
      <property>
          <name>hadoop.rpc.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>hadoop.security.authentication</name>
          <value>kerberos</value>
      </property>
      <property>
          <name>hadoop.ssl.enabled.protocols</name>
          <value>TLSv1.2</value>
      </property>
    </configuration>
  hdfs-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>dfs.nameservices</name>
        <value>analytics-test-hadoop</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.analytics-test-hadoop</name>
        <value>an-test-master1001-eqiad-wmnet,an-test-master1002-eqiad-wmnet</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-test-hadoop.an-test-master1001-eqiad-wmnet</name>
        <value>an-test-master1001.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-test-hadoop.an-test-master1002-eqiad-wmnet</name>
        <value>an-test-master1002.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.client.failover.proxy.provider.analytics-test-hadoop</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
       <name>dfs.blocksize</name>
       <value>268435456</value>
      </property>
      <property>
        <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
        <value>true</value>
      </property>
      <property>
          <name>dfs.block.access.token.enable</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.data.transfer.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>dfs.datanode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.key.bitlength</name>
          <value>128</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.suites</name>
          <value>AES/CTR/NoPadding</value>
      </property>
      <property>
          <name>dfs.http.policy</name>
          <value>HTTPS_ONLY</value>
      </property>
      <property>
          <name>dfs.namenode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.web.authentication.kerberos.principal</name>
          <value>HTTP/_HOST@WIKIMEDIA</value>
      </property>
    </configuration>

---
apiVersion: v1
kind: Secret
metadata:
  name: hdfs-token-wmf
  namespace: spark
type: HadoopDelegationToken
data:
  hadoop.token: TOKEN
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-wmf
  namespace: spark
spec:
  type: Scala
  mode: cluster
  image: "docker-registry.wikimedia.org/spark:3.3.0-2"
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.HdfsTest
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar"
  arguments: ["hdfs://analytics-test-hadoop/user/nfraison/test"]
  sparkVersion: "3.3.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "hadoop"
      configMap:
        name: hadoop-conf
  sparkConf:
    spark.driver.port: "12000"
    spark.driver.blockManager.port: "13000"
    spark.ui.port: "4045"
  driver:
    cores: 2
    coreLimit: "4"
    memory: "1g"
    labels:
      version: "3.3.0"
    serviceAccount: spark-driver
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"
  executor:
    cores: 2
    coreLimit: "4"
    instances: 2
    memory: "1g"
    labels:
      version: "3.3.0"
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"

Succeeding to read the dumb test file

23/03/21 17:53:39 INFO DAGScheduler: Job 10 is finished. Cancelling potential speculative or zombie tasks for this job
23/03/21 17:53:39 INFO TaskSchedulerImpl: Killing all running tasks in stage 10: Stage finished
23/03/21 17:53:39 INFO DAGScheduler: Job 10 finished: take at HdfsTest.scala:46, took 0.083443 s
File contents: [sqkglhqsk
23/03/21 17:53:39 INFO SparkContext: Starting job: sum at HdfsTest.scala:47
23/03/21 17:53:39 INFO DAGScheduler: Got job 11 (sum at HdfsTest.scala:47) with 1 output partitions
23/03/21 17:53:39 INFO DAGScheduler: Final stage: ResultStage 11 (sum at HdfsTest.scala:47)

2 things that will have to be added in the roadmap:

  • Management of hadoop/hive/spark config. Currently pushed as configmap with the job but should probably be a common configmap for all jobs)
  • Management of jars/dependencies. Currently rely on local example jars, for real application dependencies we need to find some solution: ceph s3 (but not available for now), archiva (find for prod jobs with released artifact but not good for testing ones not pushed in archiva), http hosted on the wrapper submitting the job to serve files to running app, other?

Great work getting this far @nfraison.

  • Management of jars/dependencies. Currently rely on local example jars, for real application dependencies we need to find some solution: ceph s3 (but not available for now), archiva (find for prod jobs with released artifact but not good for testing ones not pushed in archiva), http hosted on the wrapper submitting the job to serve files to running app, other?

We can use hdfs:// based URLs for loading jars, can't we?

Our refinery and refinery-source prpjects already deploy jars to HDFS as part of the normal deployment train, so I believe that we could extend this mechanism.
https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Cluster/Deploy/Refinery

I need to test this one I'm not hundred percent sure but if it is we should indeed rely on this and this should be manage by our spark8s cli.
The cli should check those dependencies and ensure that they are pushed to hdfs/update conf to point to the appropriate hdfs path.
This feature could be disabled with a specific flag if needed

Change 901562 merged by Nicolas Fraison:

[operations/puppet@production] hadoop: Authorize access from dse k8s pods to hdfs and hive-metastore prod

https://gerrit.wikimedia.org/r/901562

How to get a token for test

:nfraison@pop-os:~$ ssh an-test-client1001.eqiad.wmnet
Linux an-test-client1001 4.19.0-20-amd64 #1 SMP Debian 4.19.235-1 (2022-03-17) x86_64
Debian GNU/Linux 10 (buster)
Netbox Status: active
an-test-client1001 is a Analytics Hadoop test client (analytics_test_cluster::client)
Virtual Machine on Ganeti cluster eqiad and group A
This host is capable of Kerberos authentication in the WIKIMEDIA realm.
For more info: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide
The last Puppet run was at Wed Mar 22 09:44:53 UTC 2023 (23 minutes ago). 
Last Puppet commit: (119b7634d9) Btullis - Allow hive on bullseye to install and use the correct packages
Debian GNU/Linux 10 auto-installed on Wed Oct 21 22:23:14 UTC 2020.
Last login: Tue Mar 21 17:37:12 2023 from 2620:0:862:1:91:198:174:9

You have a valid Kerberos ticket.
Your automatic Kerberos ticket renewal service is also active on this host

nfraison@an-test-client1001:~$ rm token 
nfraison@an-test-client1001:~$ klist
Ticket cache: FILE:/tmp/krb5cc_43343
Default principal: nfraison@WIKIMEDIA

Valid starting       Expires              Service principal
03/22/2023 00:00:06  03/23/2023 23:59:22  krbtgt/WIKIMEDIA@WIKIMEDIA
	renew until 03/28/2023 15:52:59
nfraison@an-test-client1001:~$ fet^C
nfraison@an-test-client1001:~$ hdfs fetchdt token
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
23/03/22 10:08:23 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 840754 for nfraison on ha-hdfs:analytics-test-hadoop
Fetched token for ha-hdfs:analytics-test-hadoop into file:/home/nfraison/token
nfraison@an-test-client1001:~$ ls token 
token

Work fine in prod:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hadoop-conf
  namespace: spark
data:
  core-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://analytics-hadoop/</value>
      </property>
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
      </property>
      <property>
          <name>fs.permissions.umask-mode</name>
          <value>027</value>
      </property>
      <property>
          <name>hadoop.http.staticuser.user</name>
          <value>yarn</value>
      </property>
      <property>
          <name>hadoop.rpc.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>hadoop.security.authentication</name>
          <value>kerberos</value>
      </property>
      <property>
          <name>hadoop.ssl.enabled.protocols</name>
          <value>TLSv1.2</value>
      </property>
    </configuration>
  hdfs-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>dfs.nameservices</name>
        <value>analytics-hadoop</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.analytics-hadoop</name>
        <value>an-master1001-eqiad-wmnet,an-master1002-eqiad-wmnet</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-hadoop.an-master1001-eqiad-wmnet</name>
        <value>an-master1001.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.analytics-hadoop.an-master1002-eqiad-wmnet</name>
        <value>an-master1002.eqiad.wmnet:8020</value>
      </property>
      <property>
        <name>dfs.client.failover.proxy.provider.analytics-hadoop</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
       <name>dfs.blocksize</name>
       <value>268435456</value>
      </property>
      <property>
        <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
        <value>true</value>
      </property>
      <property>
          <name>dfs.block.access.token.enable</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.data.transfer.protection</name>
          <value>privacy</value>
      </property>
      <property>
          <name>dfs.datanode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer</name>
          <value>true</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.key.bitlength</name>
          <value>128</value>
      </property>
      <property>
          <name>dfs.encrypt.data.transfer.cipher.suites</name>
          <value>AES/CTR/NoPadding</value>
      </property>
      <property>
          <name>dfs.http.policy</name>
          <value>HTTPS_ONLY</value>
      </property>
      <property>
          <name>dfs.namenode.kerberos.principal</name>
          <value>hdfs/_HOST@WIKIMEDIA</value>
      </property>
      <property>
          <name>dfs.web.authentication.kerberos.principal</name>
          <value>HTTP/_HOST@WIKIMEDIA</value>
      </property>
    </configuration>

---
apiVersion: v1
kind: Secret
metadata:
  name: hdfs-token-wmf
  namespace: spark
type: HadoopDelegationToken
data:
  hadoop.token: TOKEN
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-wmf
  namespace: spark
spec:
  type: Scala
  mode: cluster
  image: "docker-registry.wikimedia.org/spark:3.3.0-2"
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.HdfsTest
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar"
  arguments: ["hdfs://analytics-hadoop/user/nfraison/test"]
  sparkVersion: "3.3.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "hadoop"
      configMap:
        name: hadoop-conf
  sparkConf:
    spark.driver.port: "12000"
    spark.driver.blockManager.port: "13000"
    spark.blockManager.port: "13000"
    spark.ui.port: "4045"
  driver:
    cores: 2
    coreLimit: "4"
    memory: "1g"
    labels:
      version: "3.3.0"
    serviceAccount: spark-driver
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"
  executor:
    cores: 2
    coreLimit: "4"
    instances: 2
    memory: "1g"
    labels:
      version: "3.3.0"
    secrets:
      - name: hdfs-token-wmf
        path: /mnt/secrets
        secretType: HadoopDelegationToken
    volumeMounts:
      - name: "hadoop"
        mountPath: "/etc/hadoop/conf"
    envVars:
      HADOOP_CONF_DIR: "/etc/hadoop/conf"

FI updating the mainApplicationFile to mainApplicationFile: "hdfs://analytics-hadoop/user/nfraison/spark-examples_2.12-3.3.0.jar" works fine so no specific need to manage this for now

Change 902409 had a related patch set uploaded (by Nicolas Fraison; author: Nicolas Fraison):

[operations/deployment-charts@master] spark: authorize communication between executors on blockManager port

https://gerrit.wikimedia.org/r/902409

Change #902409 abandoned by Btullis:

[operations/deployment-charts@master] spark: authorize communication between executors on blockManager port

Reason:

Now being implemented in If64c0fd8663c0f72852e2893bc11ac89e77e0554

https://gerrit.wikimedia.org/r/902409