Page MenuHomePhabricator

How to submit a Pyspark job with a keytab and principal?
Closed, ResolvedPublic

Description

Here is my command:

PYSPARK_PYTHON=./environment/bin/python \
              spark2-submit \
              --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python \
              --master yarn \
              --deploy-mode cluster \
              --principal bmansurov@WIKIMEDIA \
              --keytab /tmp/krb5cc_10570 \
              --archives environment.tar.gz#environment \
              knowledge_gaps/__main__.py \
              20220101 20220131 \
              --projects enwiki \
              --pageviews_table bmansurov.pageview_hourly_2022_01 \
              --wikipedia_pages_table wikipedia_pages_2022_01 \
              --mediawiki_snapshot 2022-01 \
              --wikidata_snapshot 2022-01-24

And here is the error:

Exception in thread "main" org.apache.hadoop.security.KerberosAuthException: Login failure for user: bmansurov@WIKIMEDIA from keytab /tmp/krb5cc_10570 javax.security.auth.login.LoginException: Unable to obtain password from user

I've also tried running the above command as the analytics-privatedata user, but no luck.

What am I doing wrong? Thanks!

Event Timeline

Creating a spark session directly (as opposed to using wmfdata) seems to have fixed the issue.