Page MenuHomePhabricator
Paste P5756

pyspark error
ActivePublic

Authored by dcausse on Jul 17 2017, 5:21 PM.
Tags
None
Referenced Files
F8789146: pyspark error
Jul 17 2017, 5:21 PM
Subscribers
None
Container exited with a non-zero exit code 50
17/07/17 17:13:23 ERROR TaskSetManager: Task 0 in stage 26.0 failed 4 times; aborting job
Traceback (most recent call last):
File "/home/dcausse/mjolnir/venv/lib/python2.7/site-packages/mjolnir/cli/data_pipeline.py", line 199, in <module>
main(sc, sqlContext, **args)
File "/home/dcausse/mjolnir/venv/lib/python2.7/site-packages/mjolnir/cli/data_pipeline.py", line 69, in main
min_sessions_per_query=min_sessions_per_query)
File "/home/dcausse/mjolnir/venv/lib/python2.7/site-packages/mjolnir/sampling.py", line 217, in sample
df_queries_sampled = _sample_queries(df_queries_unique, wikis, samples_desired=queries_per_wiki, seed=seed)
File "/home/dcausse/mjolnir/venv/lib/python2.7/site-packages/mjolnir/sampling.py", line 163, in _sample_queries
.toDF(['wikiid', 'norm_query_id']))
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 57, in toDF
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 520, in createDataFrame
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 360, in _createFromRDD
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 331, in _inferSchema
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1361, in first
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/context.py", line 965, in runJob
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/dcausse/mjolnir/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 4 times, most recent failure: Lost task 0.3 in stage 26.0 (TID 2538, analytics1064.eqiad.wmnet, executor 214): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)