Page MenuHomePhabricator

WDCM_Sqoop_Clients.R fails from stat1004 (again)
Closed, ResolvedPublic

Description

The WDCM_Sqoop_Clients.R fails from stat1004 again. This is an orchestrating R program that runs hundreds of Apache Sqoop jobs to produce the goransm.wdcm_clients_wb_entity_usage Hive table in the Data Lake.

What I have noticed thus far:

  • From stat1004, beeline, following a kinit auth as gorasnm:
select * from goransm.wdcm_clients_wb_entity_usage where wiki_db="enwiki" limit 10;

results in (verbatim output suppressed):

Error: Error while compiling statement: FAILED: SemanticException Unable to determine if hdfs://analytics-hadoop/tmp/wmde/analytics/wdcm/wdcmsqoop/wdcm_clients_wb_entity_usage is encrypted: org.apache.hadoop.security.AccessControlException: Permission denied: user=goransm, access=EXECUTE, inode="/tmp/wmde":analytics-privatedata:hdfs:drwxr-x---

However, working from the R script on stat1004 directly, where analytics-privatedata is used in place of goransm, the following command:

sudo -u analytics-privatedata kerberos-run-command analytics-privatedata /usr/bin/sqoop import --connect jdbc:mysql://s1-analytics-replica.eqiad.wmnet:3311/enwiki --password-file /user/goransm/mysql-analytics-research-client-pw.txt --username research -m 16 --driver org.mariadb.jdbc.Driver --query "select * from wbc_entity_usage where \$CONDITIONS" --split-by eu_row_id --as-avrodatafile --target-dir /tmp/wmde/analytics/wdcm/wdcmsqoop/wdcm_clients_wb_entity_usage/wiki_db=enwiki --delete-target-dir> 
`

results in (verbatim output suppressed):

21/04/27 21:24:09 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: org.mariadb.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: org.mariadb.jdbc.Driver

for which I believe to be similar to a problem recently addressed in T274866.

Please advise. The WDCM updates are incomplete and the dashboards are currently down.

Event Timeline

GoranSMilovanovic created this task.

Hi Goran!

We moved the hive server nodes to Debian Buster recently (T231067) and we had a problem with Hive and Mariadb, highlights in https://issues.apache.org/jira/browse/HIVE-25020.
As a quick workaround you should be able to unblock your queries just removing --driver org.mariadb.jdbc.Driver, can you try to see if it works?

The first error is unrelated, it is due to the fact that you have created the /tmp/wmde directory as analytics-privatedata (and by default permissions allow only user + group, not others) and your username can't read files in there.

@elukey Thank your for a prompt response, Luca!

The first error is unrelated, it is due to the fact that you have created the /tmp/wmde directory as analytics-privatedata (and by default permissions allow only user + group, not others) and your username can't read files in there.

I thought so.

We moved the hive server nodes to Debian Buster recently (T231067) and we had a problem with Hive and Mariadb...

I suspected so : )

As a quick workaround you should be able to unblock your queries just removing --driver org.mariadb.jdbc.Driver, can you try to see if it works?

Testing now and getting back to you then. Thanks!

@elukey

As a quick workaround you should be able to unblock your queries just removing --driver org.mariadb.jdbc.Driver, can you try to see if it works?

That would do, thank you Luca! I will keep the ticket open just in case, until the update script is done, but the enwiki Sqoop test was already a success.

GoranSMilovanovic lowered the priority of this task from High to Low.Apr 28 2021, 6:37 AM

Nice! Let's keep it open since I want to understand if we need to use --driver com.mysql.jdbc.Driver or not, it will have some impact also for Analytics, thanks a lot for bringing this up and sorry for the trouble!

@elukey No worries. Let me know if you need any external tests performed.

elukey claimed this task.

No issues from our side, going to close, please reopen if necessary!

@elukey Let's take a close look at this, if you agree.

@WMDE-leszek @elukey I would like to learn from this.

The following argument to /usr/bin/sqoop

--driver org.mariadb.jdbc.Driver

seems to have been causing us trouble for some time already in WDCM_Sqoop_Clients.R

I am now uncertain whether should I deploy the script to the respective Analytics Client (stat1004 in this particular case) with this argument present or not.

It would help me if I could understand the cause of the recent failure in relation to driver org.mariadb.jdbc.Driver, similarly to what @elukey asked for in T281316#7040737.

Thanks.

Change 683801 had a related patch set uploaded (by GoranSMilovanovic; author: GoranSMilovanovic):

[analytics/wmde/WD/WikidataAnalytics@master] T281316

https://gerrit.wikimedia.org/r/683801

Change 683801 merged by GoranSMilovanovic:

[analytics/wmde/WD/WikidataAnalytics@master] T281316

https://gerrit.wikimedia.org/r/683801

@GoranSMilovanovic sure! During the migration of the hosts where Hive Server/Metastore runs to Debian Buster, we encountered a lot of problems with the only available java lib for mysql, namely the one containing the org.mariadb.jdbc.Driver JDBC driver. We have now reverted back to the old mysql driver, manually porting the missing debian packages from Stretch to Buster, and now sqoop needs to run without the extra --driver option. So this option caused problems due to us trying to figure out how to upgrade our systems following Debian best practices, but hopefully now we should be good (at least until Debian Bullseye, the new version, will be out).

I have opened https://issues.apache.org/jira/browse/HIVE-25020 to Hive upstream to investigate further, but it will be a longer topic to solve I am afraid.

Lemme know if you want to know more.

@elukey Thank you. I was thinking along the following lines:

  • if due to any updates, upgrades, or other changes, this turns out to be a persistent problem,
  • then is there a way to ask from some Analytics Client if org.mariadb.jdbc.Driver should or should not be specified,
  • e.g. by issuing a query, or checking something, directly from R/Python during the script runtime?

Because if something like that is possible, than I could re-factor the script a bit to include one additional decision step to formulate the Sqoop call correctly.

I think we should be fine from now on, I wouldn't add more complexity to what we have :)

Ok. In any case the fix to this script is easy if anything similar happens again. I will begin to monitor the Sqoop runs more closely.
Thank you @elukey !