We have seen a number of errors from the monthly sqoop process at the start of July 2025.
To begin with, we had an issue where the password for the s53272 user had been changed.
When we reset that and restarted the import process, it proceeded but then showed errors at a certain point.
The errors are as follows:
| 1 | Jul 02 18:09:33 an-launcher1002 systemd[1]: Starting Schedules sqoop to import whole MediaWiki databases into Hadoop monthly.... |
|---|---|
| 2 | Jul 02 18:09:34 an-launcher1002 kerberos-run-command[1995257]: User analytics executes as user analytics the command ['/usr/local/bin/refinery-sqoop-whole-mediawiki'] |
| 3 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:21:35 ERROR ERROR: commonswiki.user (try 1) |
| 4 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 5 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 6 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 7 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 8 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 9 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1. |
| 10 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:22:49 ERROR ERROR: commonswiki.user (try 2) |
| 11 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 12 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 13 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 14 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 15 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 16 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1. |
| 17 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:23:17 ERROR ERROR: commonswiki.user (try 3) |
| 18 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 19 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 20 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 21 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 22 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 23 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1. |
| 24 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T01:08:13 ERROR ERROR: testwiki.user (try 1) |
| 25 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 26 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 27 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 28 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 29 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 30 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1. |
| 31 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:16:45 ERROR ERROR: urwikibooks.user (try 1) |
| 32 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 33 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 34 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 35 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 36 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 37 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-urwikibooks.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/urwikibooks_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidburwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1. |
| 38 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:20:36 ERROR ERROR: ttwikibooks.archive (try 1) |
| 39 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 40 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 41 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 42 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 43 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 44 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-ttwikibooks.archive'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/ttwikibooks_p?characterEncoding=UTF-8', '--query', '\n select ar_id,\n ar_namespace,\n convert(ar_title using utf8mb4) ar_title,\n null ar_text,\n null ar_comment,\n null ar_user,\n null ar_user_text,\n convert(ar_timestamp using utf8mb4) ar_timestamp,\n ar_minor_edit,\n null ar_flags,\n ar_rev_id,\n null ar_text_id,\n ar_deleted,\n ar_len,\n ar_page_id,\n ar_parent_id,\n convert(ar_sha1 using utf8mb4) ar_sha1,\n null ar_content_model,\n null ar_content_format,\n ar_actor,\n ar_comment_id\n\n from archive\n where $CONDITIONS\n \n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/archive/snapshot202506/wikidbttwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'archive', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"ar_actor=Long,ar_comment=String,ar_comment_id=Long,ar_content_format=String,ar_content_model=String,ar_deleted=Integer,ar_flags=String,ar_minor_edit=Boolean,ar_text=String,ar_user=Long,ar_user_text=String,ar_text_id=Long"']' returned non-zero exit status 1. |
| 45 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:04 ERROR ERROR: mrwikibooks.change_tag (try 1) |
| 46 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 47 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 179, in sqoop_wiki |
| 48 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Hdfs.mv(tmp_target_directory, target_directory, inParent=False) |
| 49 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/hdfs.py", line 145, in mv |
| 50 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: sh(['hdfs', 'dfs', '-mv', from_paths[i], to_paths[i]]) |
| 51 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/util.py", line 125, in sh |
| 52 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise RuntimeError("Command: {0} failed with error code: {1}" |
| 53 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: RuntimeError: ('Command: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/changetag/snapshot202506/wikidbmrwikibooks /wmf/data/raw/mediawiki/tables/change_tag/snapshot=2025-06/wiki_db=mrwikibooks failed with error code: 1', b'', b'25/07/03 04:21:01 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1003-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:01 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1004-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:02 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 2 failover attempts. Trying to failover after sleeping for 1719ms. Current retry count: 2.\n25/07/03 04:21:03 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 3 failover attempts. Trying to failover after sleeping for 5227ms. Current retry count: 3.\n25/07/03 04:21:09 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 4 failover attempts. Trying to failover after sleeping for 5637ms. Current retry count: 4.\n25/07/03 04:21:14 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 5 failover attempts. Trying to failover after sleeping for 10461ms. Current retry count: 5.\n25/07/03 04:21:25 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 6 failover attempts. Trying to failover after sleeping for 10941ms. Current retry count: 6.\n25/07/03 04:21:36 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 7 failover attempts. Trying to failover after sleeping for 20074ms. Current retry count: 7.\n25/07/03 04:21:56 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 8 failover attempts. Trying to failover after sleeping for 20951ms. Current retry count: 8.\n25/07/03 04:22:17 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 9 failover attempts. Trying to failover after sleeping for 19562ms. Current retry count: 9.\n25/07/03 04:22:36 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 10 failover attempts. Trying to failover after sleeping for 21413ms. Current retry count: 10.\n25/07/03 04:22:58 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 11 failover attempts. Trying to failover after sleeping for 14293ms. Current retry count: 11.\n25/07/03 04:23:12 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 12 failover attempts. Trying to failover after sleeping for 21747ms. Current retry count: 12.\n25/07/03 04:23:34 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 13 failover attempts. Trying to failover after sleeping for 19405ms. Current retry count: 13.\n25/07/03 04:23:53 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 14 failover attempts. Trying to failover after sleeping for 10173ms. Current retry count: 14.\nmv: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost\n') |
| 54 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:18 ERROR ERROR: mrwikibooks.user_groups (try 1) |
| 55 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 56 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 57 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 58 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 59 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 60 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-mrwikibooks.user_groups'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/mrwikibooks_p?characterEncoding=UTF-8', '--query', '\n select ug_user,\n convert(ug_group using utf8mb4) ug_group\n\n from user_groups\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/usergroups/snapshot202506/wikidbmrwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user_groups', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar']' returned non-zero exit status 1. |
| 61 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:44 ERROR ERROR: mrwikibooks.archive (try 1) |
| 62 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 63 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 179, in sqoop_wiki |
| 64 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Hdfs.mv(tmp_target_directory, target_directory, inParent=False) |
| 65 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/hdfs.py", line 145, in mv |
| 66 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: sh(['hdfs', 'dfs', '-mv', from_paths[i], to_paths[i]]) |
| 67 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/util.py", line 125, in sh |
| 68 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise RuntimeError("Command: {0} failed with error code: {1}" |
| 69 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: RuntimeError: ('Command: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/archive/snapshot202506/wikidbmrwikibooks /wmf/data/raw/mediawiki/tables/archive/snapshot=2025-06/wiki_db=mrwikibooks failed with error code: 1', b'', b'25/07/03 04:21:00 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1004-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:33 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 2 failover attempts. Trying to failover after sleeping for 2085ms. Current retry count: 2.\n25/07/03 04:21:35 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 3 failover attempts. Trying to failover after sleeping for 4338ms. Current retry count: 3.\n25/07/03 04:21:39 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 4 failover attempts. Trying to failover after sleeping for 6357ms. Current retry count: 4.\n25/07/03 04:21:46 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 5 failover attempts. Trying to failover after sleeping for 20831ms. Current retry count: 5.\n25/07/03 04:22:07 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 6 failover attempts. Trying to failover after sleeping for 21337ms. Current retry count: 6.\n25/07/03 04:22:28 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 7 failover attempts. Trying to failover after sleeping for 20539ms. Current retry count: 7.\n25/07/03 04:22:49 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 8 failover attempts. Trying to failover after sleeping for 20593ms. Current retry count: 8.\n25/07/03 04:23:09 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 9 failover attempts. Trying to failover after sleeping for 19268ms. Current retry count: 9.\n25/07/03 04:23:28 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 10 failover attempts. Trying to failover after sleeping for 18565ms. Current retry count: 10.\n25/07/03 04:23:47 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 11 failover attempts. Trying to failover after sleeping for 20802ms. Current retry count: 11.\n25/07/03 04:24:08 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 12 failover attempts. Trying to failover after sleeping for 14586ms. Current retry count: 12.\n25/07/03 04:24:22 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 13 failover attempts. Trying to failover after sleeping for 10787ms. Current retry count: 13.\n25/07/03 04:24:33 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 14 failover attempts. Trying to failover after sleeping for 10018ms. Current retry count: 14.\nmv: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost\n') |
| 70 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:29:02 ERROR ERROR: testwiki.user (try 2) |
| 71 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 72 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 73 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 74 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 75 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 76 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1. |
| 77 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:29:46 ERROR ERROR: testwiki.user (try 3) |
| 78 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last): |
| 79 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki |
| 80 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL) |
| 81 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call |
| 82 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd) |
| 83 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1. |
| 84 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR ************************************************** |
| 85 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * Jobs to re-run: |
| 86 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * - commonswiki:user |
| 87 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * - testwiki:user |
| 88 | Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR ************************************************** |
| 89 | Jul 03 06:30:03 an-launcher1002 systemd[1]: refinery-sqoop-whole-mediawiki.service: Main process exited, code=exited, status=1/FAILURE |
They all occurred at around 06:30 UTC on July 3rd, although they also refer to multiple tries.
The HDFS service is also possibly related to this, as some of the runtime errors refer to the an-master100[3-4] nodes.