Page MenuHomePhabricator

Unable to access Hive from notebook1003
Closed, ResolvedPublic

Description

From Jupyter on notebook1003, the following commands yield the indicated error, although it seems I'm authenticated via Kerberos ahead of running it. Is there an additional step I should be following? I wasn't sure from https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide if there was something else I was missing.

Command

import findspark, os
os.environ['SPARK_HOME'] = '/usr/lib/spark2';
findspark.init()
import pyspark
import pyspark.sql
conf = pyspark.SparkConf().setMaster("yarn")  # Use master yarn here if you are going to query large datasets.
conf.set('spark.executor.memory', '8g')
conf.set('spark.yarn.executor.memoryOverhead', '1024')
conf.set('spark.executor.cores', '4')
conf.set('spark.dynamicAllocation.maxExecutors', '32')
conf.set('spark.driver.memory', '4g')
conf.set('spark.driver.maxResultSize', '10g')
conf.set('spark.logConf', True)
sc = pyspark.SparkContext(conf=conf)
spark_hive = pyspark.sql.HiveContext(sc)

%config SQL.conn_name = 'spark_hive'

Error

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-13-d01c37e0bc96> in <module>()
     12 conf.set('spark.driver.maxResultSize', '10g')
     13 conf.set('spark.logConf', True)
---> 14 sc = pyspark.SparkContext(conf=conf)
     15 spark_hive = pyspark.sql.HiveContext(sc)
     16 

/usr/lib/spark2/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    134         try:
    135             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
--> 136                           conf, jsc, profiler_cls)
    137         except:
    138             # If an error occurs, clean up in order to allow future SparkContext creation:

/usr/lib/spark2/python/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
    196 
    197         # Create the Java SparkContext through Py4J
--> 198         self._jsc = jsc or self._initialize_context(self._conf._jconf)
    199         # Reset the SparkConf to the one actually used by the SparkContext in JVM.
    200         self._conf = SparkConf(_jconf=self._jsc.sc().conf())

/usr/lib/spark2/python/pyspark/context.py in _initialize_context(self, jconf)
    304         Initialize SparkContext in function to allow subclass specific initialization
    305         """
--> 306         return self._jvm.JavaSparkContext(jconf)
    307 
    308     @classmethod

/usr/lib/spark2/python/py4j/java_gateway.py in __call__(self, *args)
   1523         answer = self._gateway_client.send_command(command)
   1524         return_value = get_return_value(
-> 1525             answer, self._gateway_client, None, self._fqn)
   1526 
   1527         for temp_arg in temp_args:

/usr/lib/spark2/python/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "notebook1003/10.64.21.109"; destination host is: "an-master1001.eqiad.wmnet":8032; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
	at org.apache.hadoop.ipc.Client.call(Client.java:1474)
	at org.apache.hadoop.ipc.Client.call(Client.java:1401)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy14.getNewApplication(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:217)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy15.getNewApplication(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:206)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:214)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:168)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:645)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:732)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523)
	at org.apache.hadoop.ipc.Client.call(Client.java:1440)
	... 28 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:555)
	at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:724)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:720)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:719)
	... 31 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:148)
	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
	at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189)
	at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
	... 40 more

Kerberos on notebook1003 from Jupyter inbuilt terminal

dr0ptp4kt@notebook1003:~$ hostname
notebook1003
dr0ptp4kt@notebook1003:~$ date
Mon Jan 20 20:20:27 UTC 2020
dr0ptp4kt@notebook1003:~$ klist
Ticket cache: FILE:/tmp/krb5cc_2962
Default principal: dr0ptp4kt@WIKIMEDIA

Valid starting       Expires              Service principal
01/20/2020 20:08:09  01/21/2020 06:08:09  krbtgt/WIKIMEDIA@WIKIMEDIA
        renew until 01/21/2020 20:07:22

Notebook on stat1007 from independent SSH session

dr0ptp4kt@stat1007:~$ hostname
stat1007
dr0ptp4kt@stat1007:~$ date
Mon Jan 20 20:20:52 UTC 2020
dr0ptp4kt@stat1007:~$ klist
Ticket cache: FILE:/tmp/krb5cc_2962
Default principal: dr0ptp4kt@WIKIMEDIA

Valid starting       Expires              Service principal
01/20/2020 20:17:32  01/21/2020 06:17:32  krbtgt/WIKIMEDIA@WIKIMEDIA
        renew until 01/21/2020 20:16:56

Event Timeline

@Reedy : kerberos ticket needs to be created in the notebook environment, using a notebook terminal page. See https://wikitech.wikimedia.org/wiki/SWAP#Kerberos.

@JAllemandou I noticed you at-mentioned @Reedy here. Does @Reedy need to add some kind of permissions?

For the sake of completeness from the Jupyter terminal I did a kdestroy and fresh kinit, but the same error is produced after running those commands then attempting to execute the Jupyter cell again.

dr0ptp4kt@notebook1003:~$ kdestroy
dr0ptp4kt@notebook1003:~$ kinit
Password for dr0ptp4kt@WIKIMEDIA:
dr0ptp4kt@notebook1003:~$ klist
Ticket cache: FILE:/tmp/krb5cc_2962
Default principal: dr0ptp4kt@WIKIMEDIA

Valid starting       Expires              Service principal
01/21/2020 12:35:50  01/21/2020 22:35:50  krbtgt/WIKIMEDIA@WIKIMEDIA
        renew until 01/22/2020 12:35:42

@dr0ptp4kt Thanks for making me seing things correctly :)
I think you need to kdestroy from ssh session to clean this session, then kdestroy and kinit again in notebook terminal to start fresh.

@Reedy I mentioned you by mistake, please excuse me.

elukey assigned this task to JAllemandou.
elukey set Final Story Points to 3.