Attempting to run the following query using a HiveServer2 client (such as Beeline, Hue, or Impyla) fails with the message "Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask". However, it works perfectly using the hive client.
select ug_user, log_type from wmf_raw.mediawiki_user_groups ug join wmf_raw.mediawiki_logging log on ug_user = log_user where log.wiki_db = "guwiki" and ug.wiki_db = "guwiki" and log.snapshot = "2018-08" and ug.snapshot = "2018-08"
The full output when using beeline in verbose mode is:
INFO : Compiling command(queryId=hive_20181005005555_e6e1c3cc-e854-4cf2-8fed-8a5f990189b6): {{query}} INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ug_user, type:bigint, comment:null), FieldSchema(name:log_type, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20181005005555_e6e1c3cc-e854-4cf2-8fed-8a5f990189b6); Time taken: 0.137 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20181005005555_e6e1c3cc-e854-4cf2-8fed-8a5f990189b6): {{query}} INFO : Query ID = hive_20181005005555_e6e1c3cc-e854-4cf2-8fed-8a5f990189b6 INFO : Total jobs = 1 INFO : Starting task [Stage-4:MAPREDLOCAL] in serial mode ERROR : Execution failed with exit status: 1 ERROR : Obtaining error information ERROR : Task failed! Task ID: Stage-4 Logs: ERROR : /var/log/hive/hive-server2.log ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask INFO : Completed executing command(queryId=hive_20181005005555_e6e1c3cc-e854-4cf2-8fed-8a5f990189b6); Time taken: 1.069 seconds Getting log thread is interrupted, since query is done! Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=1) java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:284) at org.apache.hive.beeline.Commands.executeInternal(Commands.java:986) at org.apache.hive.beeline.Commands.execute(Commands.java:1158) at org.apache.hive.beeline.Commands.sql(Commands.java:1072) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1172) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1003) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:915) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:511) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:494) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
As @Ottomata suggested, running SET hive.auto.convert.join=false; (as described here) before the query makes it work properly.
It would be good to know what exactly went wrong and that setting fixes it so we can avoid problems in the future.