Page MenuHomePhabricator

Move refinery to hive 2 actions
Open, HighPublic21 Story Points

Description

In T212259#5288214 we identified a way to make hive actions work in a Hadoop kerberized cluster. As preliminary step we should, in my opinion, slowly move all refinery actions to hive2 ones (without credentials for the moment) and make sure that they work as expected. This would surely reduce the number of errors that we could encounter when Kerberos will be enabled. Ideally this task is something that people can do while on ops duty (so moving some jobs each week as part of the weekly train deployment).

These are the available schemas for hive2 actions:

https://oozie.apache.org/docs/4.3.0/DG_Hive2ActionExtension.html#Hive_2_Action_Schema_Version_0.1
https://oozie.apache.org/docs/4.3.0/DG_Hive2ActionExtension.html#Hive_2_Action_Schema_Version_0.2

I tested 0.1 but surely 0.2 is fine as well (but we'll need to verify that). The idea is, for each hive action, to do the following:

  • replace hive with hive2 (no cred field needed for now since we don't have any kerberos credential):
-    <action name="add_partition">
-        <hive xmlns="uri:oozie:hive-action:0.2">
+    <action name="add_partition">
+        <hive2 xmlns="uri:oozie:hive2-action:0.1">
  • add jdbc parameter
+            <jdbc-url>jdbc:hive2://an-coord1001.eqiad.wmnet:10000/default</jdbc-url>

The latter should be added via variable and configured in each .properties file.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
elukey moved this task from Backlog to Kerberos on the User-Elukey board.Jul 5 2019, 6:57 AM
Milimetric triaged this task as High priority.Jul 8 2019, 3:21 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.
Milimetric renamed this task from Move refinery to hive 2 actions to Move refinery to hive 2 actions.Jul 8 2019, 3:28 PM
Milimetric moved this task from Operational Excellence to Ops Week on the Analytics board.

Note: might be good to start with an oozie job like projectview_hourly, followed by pageview_hourly and webrequest load. That way we incrementally run bigger jobs and hopefully allow ourselves to debug before we need to rerun lots of data.

Change 522395 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie/projectview/hourly: move to hive2 actions

https://gerrit.wikimedia.org/r/522395

Change 522395 merged by Nuria:
[analytics/refinery@master] projectview: move to oozie workflow to hive2 actions

https://gerrit.wikimedia.org/r/522395

Change 523200 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] pageview: move the oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/523200

Change 523212 had a related patch set uploaded (by Elukey; owner: Elukey):
[wikimedia/discovery/analytics@master] Move oozie hive actions to hive2

https://gerrit.wikimedia.org/r/523212

First annoying problems when testing beeline:

  • It seems that beeline -f script.hql --database something, I got my tables created in the default db. Using the hive tool it works.
  • Due to the above, the elukey.pageview hourly table was not there when I ran the coordinator for the first time, and it failed of course. This is what I got from Hue's logs:
<<< Invocation of Beeline command completed <<<

No child hadoop job is executed.
Intercepting System.exit(2)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://analytics-hadoop/user/elukey/oozie-oozi/0001878-190715143115257-oozie-oozi-W/aggregate--hive2/action-data.seq
Successfully reset security manager from org.apache.oozie.action.hadoop.LauncherSecurityManager@37fdfb05 to null

Oozie Launcher ends

And this is what I found in /var/log/hive/hive-server.log:

2019-07-17 08:24:16,959 INFO  ql.Driver (Driver.java:compile(692)) - Completed compiling command(queryId=hive_20190717082424_7453d439-dd32-40af-be31-452c87cd4279); Time taken: 0.056 seconds
2019-07-17 08:24:16,960 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs(1927)) - PriviledgedActionException as:elukey (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Tabl
e not found 'pageview_hourly'
2019-07-17 08:24:16,960 WARN  thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(510)) - Error executing statement:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 'pageview_hourly'

That is a similar problem of T136858

elukey added a comment.EditedJul 17 2019, 1:38 PM

In https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html I found this:

The argument element, if present, contains arguments to be passed as-is to Beeline.

So I added the following:

<argument>--verbose</argument>

And from the logs of the hive2 action in hue I can see:

Beeline command arguments :
             -u
             jdbc:hive2://an-coord1001.eqiad.wmnet:10000/default
             -n
             elukey
             -p
             DUMMY
             -d
             org.apache.hive.jdbc.HiveDriver
             -f
             pageview_hourly.hql
             --hivevar
             source_table=wmf.webrequest
             --hivevar
             destination_table=elukey.pageview_hourly
             --hivevar
             record_version=0.0.6
             --hivevar
             year=2019
             --hivevar
             month=7
             --hivevar
             day=17
             --hivevar
             hour=0
             -a
             delegationToken

             --verbose  <<===============================

             --hiveconf
             mapreduce.job.tags=oozie-5746b6858b1ccf61e24375c0f9c097a7
             --hiveconf
             oozie.action.id=0002104-190715143115257-oozie-oozi-W@aggregate
             --hiveconf
             oozie.child.mapreduce.job.tags=oozie-5746b6858b1ccf61e24375c0f9c097a7
             --hiveconf
             oozie.action.rootlogger.log.level=INFO
             --hiveconf
             oozie.job.id=0002104-190715143115257-oozie-oozi-W
             --hiveconf
             oozie.HadoopAccessorService.created=true

Tested with the previous failure use case, and I can see this in the stderr:

Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 'pageview_hourly' (state=42S02,code=10001)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 'pageview_hourly'
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
	at org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
	at org.apache.hive.beeline.Commands.execute(Commands.java:1180)
	at org.apache.hive.beeline.Commands.sql(Commands.java:1094)
	at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1180)
	at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1013)
	at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
	at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
	at org.apache.oozie.action.hadoop.Hive2Main.runBeeline(Hive2Main.java:270)
	at org.apache.oozie.action.hadoop.Hive2Main.run(Hive2Main.java:244)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81)
	at org.apache.oozie.action.hadoop.Hive2Main.main(Hive2Main.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 'pageview_hourly'
	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:187)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:271)
	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416)
	at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy21.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:503)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 Table not found 'pageview_hourly'
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1906)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1564)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10174)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10225)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:193)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:223)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:560)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1358)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1345)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:185)
	... 26 more
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 Table not found 'pageview_hourly'
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.<init>(BaseSemanticAnalyzer.java:765)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.<init>(BaseSemanticAnalyzer.java:727)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1768)
	... 35 more
Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found Table not found pageview_hourly
	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1189)
	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1140)
	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1127)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.<init>(BaseSemanticAnalyzer.java:762)
	... 37 more
Closing: 0: jdbc:hive2://an-coord1001.eqiad.wmnet:10000/default
Intercepting System.exit(2)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2]

It seems working!

Change 523934 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Add verbose argument to pageview/projectview oozie coordinators

https://gerrit.wikimedia.org/r/523934

Change 523212 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] Move oozie hive actions to hive2

https://gerrit.wikimedia.org/r/523212

Change 523200 merged by Milimetric:
[analytics/refinery@master] pageview: move the oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/523200

Change 523934 merged by Elukey:
[analytics/refinery@master] Add verbose argument to pageview/projectview oozie coordinators

https://gerrit.wikimedia.org/r/523934

@EBernhardson hi! Can we restart the coordinators listed in https://gerrit.wikimedia.org/r/523212 to pick up the new changes?

Change 525247 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] aqs: move the oozie hourly coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525247

Change 525248 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] banner_activity: move oozie daily coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525248

Change 525247 merged by Elukey:
[analytics/refinery@master] aqs: move the oozie hourly coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525247

Change 525248 merged by Elukey:
[analytics/refinery@master] banner_activity: move oozie daily coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525248

Change 525503 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] browser-general: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525503

Change 525507 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] cassandra: move oozie bundle to hive2 actions

https://gerrit.wikimedia.org/r/525507

@EBernhardson hi! Can we restart the coordinators listed in https://gerrit.wikimedia.org/r/523212 to pick up the new changes?

All three coordinators re-deployed

Change 525503 merged by Elukey:
[analytics/refinery@master] browser-general: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525503

Change 525507 merged by Elukey:
[analytics/refinery@master] cassandra: move oozie bundle to hive2 actions

https://gerrit.wikimedia.org/r/525507

Change 525809 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] data_quality: move the oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525809

Change 525810 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] edit-hourly: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525810

Change 525811 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] interlanguage: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525811

Change 525813 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] mediacounts: move archive and load oozie coord to hive2 actions

https://gerrit.wikimedia.org/r/525813

Change 525809 merged by Nuria:
[analytics/refinery@master] data_quality: move the oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525809

Change 525810 merged by Nuria:
[analytics/refinery@master] edit-hourly: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525810

Change 525811 merged by Nuria:
[analytics/refinery@master] interlanguage: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/525811

Change 525813 merged by Nuria:
[analytics/refinery@master] mediacounts: move archive and load oozie coord to hive2 actions

https://gerrit.wikimedia.org/r/525813

Change 527433 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hive: add hive.server2.logging.operation.enabled

https://gerrit.wikimedia.org/r/527433

Change 527433 merged by Elukey:
[operations/puppet@production] cdh::hive: add hive.server2.logging.operation.enabled

https://gerrit.wikimedia.org/r/527433

Change 528167 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] edit: remove hive.auto.convert.join from oozie coord's .hql file

https://gerrit.wikimedia.org/r/528167

elukey added a comment.Aug 6 2019, 8:25 AM

We encountered two issues when after the migration to hive2 actions:

Change 528167 merged by Elukey:
[analytics/refinery@master] edit: set hive.exec.submit.local.task.via.child = false in the .hql file

https://gerrit.wikimedia.org/r/528167

Change 528708 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] mobile_apps: move uniques daily/monthly oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/528708

Change 528714 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] pageview: move druid oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/528714

Change 528708 merged by Elukey:
[analytics/refinery@master] mobile_apps: move uniques daily/monthly oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/528708

Change 528714 merged by Elukey:
[analytics/refinery@master] pageview: move druid oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/528714

Change 529315 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] mediawiki: move oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/529315

Change 529377 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] projectview: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/529377

Change 529381 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] unique_devies: move oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/529381

Change 529384 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Move oozie's Hive utils workflows to hive2 actions

https://gerrit.wikimedia.org/r/529384

Change 529385 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] virtualpageview: move oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/529385

Change 529387 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] webrequest: move oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/529387

Change 529315 merged by Mforns:
[analytics/refinery@master] mediawiki: move oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/529315

Change 529377 merged by Mforns:
[analytics/refinery@master] projectview: move oozie coordinator to hive2 actions

https://gerrit.wikimedia.org/r/529377

Change 529381 merged by Mforns:
[analytics/refinery@master] unique_devices: move oozie coordinators to hive2 actions

https://gerrit.wikimedia.org/r/529381

Change 529384 merged by Mforns:
[analytics/refinery@master] Move oozie's Hive utils workflows to hive2 actions

https://gerrit.wikimedia.org/r/529384

Change 529385 merged by Mforns:
[analytics/refinery@master] virtualpageview: move oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/529385

Change 529387 merged by Mforns:
[analytics/refinery@master] webrequest: move oozie coords to hive2 actions

https://gerrit.wikimedia.org/r/529387

Change 529859 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] unique_devices: add missing jdbc_url for hive2 actions

https://gerrit.wikimedia.org/r/529859

Change 529859 merged by Elukey:
[analytics/refinery@master] unique_devices: add missing jdbc_url for hive2 actions

https://gerrit.wikimedia.org/r/529859

Milimetric assigned this task to elukey.Aug 13 2019, 4:03 PM
Milimetric added a project: Analytics-Kanban.
Milimetric moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria added a comment.Aug 21 2019, 5:49 PM

@elukey: is the pull request the last job that needs to be moved?

@elukey: is the pull request the last job that needs to be moved?

yes exactly but not directly controlled by us (it runs under Neil's username)

elukey set the point value for this task to 21.Fri, Sep 20, 9:29 AM

To keep archives happy, Joseph also followed up with:

https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/531682/
https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/529859/

Since this task is about refinery, I think that we moved all jobs correctly as needed, so we should close the task.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Fri, Sep 20, 9:31 AM
elukey moved this task from Kerberos to Done on the User-Elukey board.Fri, Sep 20, 11:35 AM