Spark 2 has been deprecated. Please migrate to Spark 3.
See https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migration_to_Spark_3
category=FutureWarning
A conda environment is already packed at conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz. If you have recently installed new packages into your conda env, set force=True in conda_pack_kwargs and it will be repacked for you.
Will ship conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz to remote Spark executors.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/17 20:51:02 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
22/11/17 20:51:03 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
22/11/17 20:51:03 WARN Utils: Service 'sparkDriver' could not bind on port 12001. Attempting port 12002.
22/11/17 20:51:03 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/11/17 20:51:03 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
22/11/17 20:51:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13000. Attempting port 13001.
22/11/17 20:51:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13001. Attempting port 13002.
Spark 2 has been deprecated. Please migrate to Spark 3.
See https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migration_to_Spark_3
category=FutureWarning
A conda environment is already packed at conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz. If you have recently installed new packages into your conda env, set force=True in conda_pack_kwargs and it will be repacked for you.
22/11/17 20:52:11 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
22/11/17 20:52:11 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
22/11/17 20:52:11 WARN Utils: Service 'sparkDriver' could not bind on port 12001. Attempting port 12002.
22/11/17 20:52:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/11/17 20:52:11 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
22/11/17 20:52:15 WARN Client: Same path resource file:/srv/home/neilpquinn-wmf/conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz#conda-2022-10-12T21.46.32_neilpquinn-wmf added multiple times to distributed cache.
java.lang.IllegalArgumentException: Attempt to add (conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz#conda-2022-10-12T21.46.32_neilpquinn-wmf) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10$$anonfun$apply$6.apply(Client.scala:607)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10$$anonfun$apply$6.apply(Client.scala:598)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:598)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:597)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:597)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:865)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
22/11/17 20:52:15 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
22/11/17 20:52:15 WARN MetricsSystem: Stopping a MetricsSystem that is not running
/usr/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
/usr/lib/spark2/python/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: Attempt to add (conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz#conda-2022-10-12T21.46.32_neilpquinn-wmf) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10$$anonfun$apply$6.apply(Client.scala:607)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10$$anonfun$apply$6.apply(Client.scala:598)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:598)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:597)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:597)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:865)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
During handling of the above exception, another exception occurred:
IllegalArgumentException: 'Attempt to add (conda-2022-10-12T21.46.32_neilpquinn-wmf.tgz#conda-2022-10-12T21.46.32_neilpquinn-wmf) multiple times to the distributed cache.'