Page MenuHomePhabricator

Errors from refinery-sqoop-whole-mediawiki.service - July 2025
Closed, ResolvedPublic

Description

We have seen a number of errors from the monthly sqoop process at the start of July 2025.
To begin with, we had an issue where the password for the s53272 user had been changed.
When we reset that and restarted the import process, it proceeded but then showed errors at a certain point.

The errors are as follows:

1Jul 02 18:09:33 an-launcher1002 systemd[1]: Starting Schedules sqoop to import whole MediaWiki databases into Hadoop monthly....
2Jul 02 18:09:34 an-launcher1002 kerberos-run-command[1995257]: User analytics executes as user analytics the command ['/usr/local/bin/refinery-sqoop-whole-mediawiki']
3Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:21:35 ERROR ERROR: commonswiki.user (try 1)
4Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
5Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
6Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
7Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
8Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
9Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1.
10Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:22:49 ERROR ERROR: commonswiki.user (try 2)
11Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
12Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
13Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
14Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
15Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
16Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1.
17Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-02T21:23:17 ERROR ERROR: commonswiki.user (try 3)
18Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
19Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
20Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
21Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
22Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
23Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-commonswiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3314/commonswiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbcommonswiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1.
24Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T01:08:13 ERROR ERROR: testwiki.user (try 1)
25Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
26Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
27Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
28Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
29Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
30Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1.
31Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:16:45 ERROR ERROR: urwikibooks.user (try 1)
32Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
33Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
34Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
35Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
36Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
37Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-urwikibooks.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/urwikibooks_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidburwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1.
38Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:20:36 ERROR ERROR: ttwikibooks.archive (try 1)
39Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
40Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
41Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
42Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
43Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
44Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-ttwikibooks.archive'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/ttwikibooks_p?characterEncoding=UTF-8', '--query', '\n select ar_id,\n ar_namespace,\n convert(ar_title using utf8mb4) ar_title,\n null ar_text,\n null ar_comment,\n null ar_user,\n null ar_user_text,\n convert(ar_timestamp using utf8mb4) ar_timestamp,\n ar_minor_edit,\n null ar_flags,\n ar_rev_id,\n null ar_text_id,\n ar_deleted,\n ar_len,\n ar_page_id,\n ar_parent_id,\n convert(ar_sha1 using utf8mb4) ar_sha1,\n null ar_content_model,\n null ar_content_format,\n ar_actor,\n ar_comment_id\n\n from archive\n where $CONDITIONS\n \n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/archive/snapshot202506/wikidbttwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'archive', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"ar_actor=Long,ar_comment=String,ar_comment_id=Long,ar_content_format=String,ar_content_model=String,ar_deleted=Integer,ar_flags=String,ar_minor_edit=Boolean,ar_text=String,ar_user=Long,ar_user_text=String,ar_text_id=Long"']' returned non-zero exit status 1.
45Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:04 ERROR ERROR: mrwikibooks.change_tag (try 1)
46Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
47Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 179, in sqoop_wiki
48Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Hdfs.mv(tmp_target_directory, target_directory, inParent=False)
49Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/hdfs.py", line 145, in mv
50Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: sh(['hdfs', 'dfs', '-mv', from_paths[i], to_paths[i]])
51Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/util.py", line 125, in sh
52Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise RuntimeError("Command: {0} failed with error code: {1}"
53Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: RuntimeError: ('Command: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/changetag/snapshot202506/wikidbmrwikibooks /wmf/data/raw/mediawiki/tables/change_tag/snapshot=2025-06/wiki_db=mrwikibooks failed with error code: 1', b'', b'25/07/03 04:21:01 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1003-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:01 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1004-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:02 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 2 failover attempts. Trying to failover after sleeping for 1719ms. Current retry count: 2.\n25/07/03 04:21:03 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 3 failover attempts. Trying to failover after sleeping for 5227ms. Current retry count: 3.\n25/07/03 04:21:09 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 4 failover attempts. Trying to failover after sleeping for 5637ms. Current retry count: 4.\n25/07/03 04:21:14 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 5 failover attempts. Trying to failover after sleeping for 10461ms. Current retry count: 5.\n25/07/03 04:21:25 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 6 failover attempts. Trying to failover after sleeping for 10941ms. Current retry count: 6.\n25/07/03 04:21:36 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 7 failover attempts. Trying to failover after sleeping for 20074ms. Current retry count: 7.\n25/07/03 04:21:56 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 8 failover attempts. Trying to failover after sleeping for 20951ms. Current retry count: 8.\n25/07/03 04:22:17 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 9 failover attempts. Trying to failover after sleeping for 19562ms. Current retry count: 9.\n25/07/03 04:22:36 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 10 failover attempts. Trying to failover after sleeping for 21413ms. Current retry count: 10.\n25/07/03 04:22:58 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 11 failover attempts. Trying to failover after sleeping for 14293ms. Current retry count: 11.\n25/07/03 04:23:12 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 12 failover attempts. Trying to failover after sleeping for 21747ms. Current retry count: 12.\n25/07/03 04:23:34 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 13 failover attempts. Trying to failover after sleeping for 19405ms. Current retry count: 13.\n25/07/03 04:23:53 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1003.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet:8020 after 14 failover attempts. Trying to failover after sleeping for 10173ms. Current retry count: 14.\nmv: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost\n')
54Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:18 ERROR ERROR: mrwikibooks.user_groups (try 1)
55Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
56Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
57Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
58Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
59Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
60Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-mrwikibooks.user_groups'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/mrwikibooks_p?characterEncoding=UTF-8', '--query', '\n select ug_user,\n convert(ug_group using utf8mb4) ug_group\n\n from user_groups\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/usergroups/snapshot202506/wikidbmrwikibooks', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user_groups', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar']' returned non-zero exit status 1.
61Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T04:24:44 ERROR ERROR: mrwikibooks.archive (try 1)
62Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
63Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 179, in sqoop_wiki
64Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Hdfs.mv(tmp_target_directory, target_directory, inParent=False)
65Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/hdfs.py", line 145, in mv
66Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: sh(['hdfs', 'dfs', '-mv', from_paths[i], to_paths[i]])
67Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/util.py", line 125, in sh
68Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise RuntimeError("Command: {0} failed with error code: {1}"
69Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: RuntimeError: ('Command: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/archive/snapshot202506/wikidbmrwikibooks /wmf/data/raw/mediawiki/tables/archive/snapshot=2025-06/wiki_db=mrwikibooks failed with error code: 1', b'', b'25/07/03 04:21:00 WARN hdfs.DFSUtilClient: Namenode for analytics-hadoop remains unresolved for ID an-master1004-eqiad-wmnet. Check your hdfs-site.xml file to ensure namenodes are configured properly.\n25/07/03 04:21:33 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 2 failover attempts. Trying to failover after sleeping for 2085ms. Current retry count: 2.\n25/07/03 04:21:35 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 3 failover attempts. Trying to failover after sleeping for 4338ms. Current retry count: 3.\n25/07/03 04:21:39 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 4 failover attempts. Trying to failover after sleeping for 6357ms. Current retry count: 4.\n25/07/03 04:21:46 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 5 failover attempts. Trying to failover after sleeping for 20831ms. Current retry count: 5.\n25/07/03 04:22:07 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 6 failover attempts. Trying to failover after sleeping for 21337ms. Current retry count: 6.\n25/07/03 04:22:28 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 7 failover attempts. Trying to failover after sleeping for 20539ms. Current retry count: 7.\n25/07/03 04:22:49 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 8 failover attempts. Trying to failover after sleeping for 20593ms. Current retry count: 8.\n25/07/03 04:23:09 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 9 failover attempts. Trying to failover after sleeping for 19268ms. Current retry count: 9.\n25/07/03 04:23:28 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 10 failover attempts. Trying to failover after sleeping for 18565ms. Current retry count: 10.\n25/07/03 04:23:47 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 11 failover attempts. Trying to failover after sleeping for 20802ms. Current retry count: 11.\n25/07/03 04:24:08 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 12 failover attempts. Trying to failover after sleeping for 14586ms. Current retry count: 12.\n25/07/03 04:24:22 INFO retry.RetryInvocationHandler: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1004.eqiad.wmnet:8020 after 13 failover attempts. Trying to failover after sleeping for 10787ms. Current retry count: 13.\n25/07/03 04:24:33 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1169)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2855)\n, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over an-master1003.eqiad.wmnet/10.64.36.15:8020 after 14 failover attempts. Trying to failover after sleeping for 10018ms. Current retry count: 14.\nmv: Invalid host name: local host is: (unknown); destination host is: "an-master1004.eqiad.wmnet":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost\n')
70Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:29:02 ERROR ERROR: testwiki.user (try 2)
71Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
72Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
73Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
74Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
75Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
76Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1.
77Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:29:46 ERROR ERROR: testwiki.user (try 3)
78Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: Traceback (most recent call last):
79Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
80Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
81Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
82Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: raise CalledProcessError(retcode, cmd)
83Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n select user_id,\n convert(user_name using utf8mb4) user_name,\n user_name user_name_binary,\n convert(user_real_name using utf8mb4) user_real_name,\n convert(user_email using utf8mb4) user_email,\n convert(user_touched using utf8mb4) user_touched,\n convert(user_registration using utf8mb4) user_registration,\n user_editcount,\n convert(user_password_expires using utf8mb4) user_password_expires,\n user_is_temp\n\n from user\n where $CONDITIONS\n ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506/wikidbtestwiki', '--num-mappers', '1', '--as-avrodatafile', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-02T18:10:08/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"', '--delete-target-dir']' returned non-zero exit status 1.
84Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR **************************************************
85Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * Jobs to re-run:
86Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * - commonswiki:user
87Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR * - testwiki:user
88Jul 03 06:30:03 an-launcher1002 kerberos-run-command[1995256]: 2025-07-03T06:30:03 ERROR **************************************************
89Jul 03 06:30:03 an-launcher1002 systemd[1]: refinery-sqoop-whole-mediawiki.service: Main process exited, code=exited, status=1/FAILURE

They all occurred at around 06:30 UTC on July 3rd, although they also refer to multiple tries.

The HDFS service is also possibly related to this, as some of the runtime errors refer to the an-master100[3-4] nodes.

Event Timeline

I am struggling the get the re-run working. Following the guidelines here: https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Edit_history_administration#Rerun_what's_needed

I have created a local copy of the refinery-sqoop-mediawiki-history script, with modifications to:

  • the list of tables to be sqooped (only user)
  • the file containing the list of wikis to be sqooped.
  • the partition value, adding a -rerun suffix
  • the use of a local log file
  • added a --verbose argument
btullis@an-launcher1002:~$ cat refinery-sqoop-mediawiki-history
#!/bin/bash
# NOTE: This file is managed by puppet
#

export PYTHONPATH=\${PYTHONPATH}:/srv/deployment/analytics/refinery/python

/usr/bin/python3 /srv/deployment/analytics/refinery/bin/sqoop-mediawiki-tables \
    --job-name sqoop-mediawiki-monthly-$(/bin/date --date="$(/bin/date +%Y-%m-15) -1 month" +'%Y-%m') \
    --clouddb \
    --output-dir /wmf/data/raw/mediawiki/tables \
    --wiki-file /home/btullis/grouped_wikis.csv \
    --tables user \
    --user s53272 \
    --password-file /user/analytics/mysql-analytics-labsdb-client-pw.txt \
    --partition-name snapshot \
    --partition-value $(/bin/date --date="$(/bin/date +%Y-%m-15) -1 month" +'%Y-%m')-rerun \
    --mappers 64 \
    --processors 10 \
    --yarn-queue production \
    --output-format avrodata \
    --log-file /home/btullis/sqoop-mediawiki.log \
    --verbose

My grouped_wikis.csv file contains this:

btullis@an-launcher1002:~$ cat grouped_wikis.csv 
commonswiki,1,0
testwiki,1,0

Both the grouped_wikis file and the log files are owned by analytics.
I then run the command as follows:

btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics ./refinery-sqoop-mediawiki-history

Each try is then failing like this;

2025-07-03T11:39:16 ERROR  ERROR: testwiki.user (try 1)
Traceback (most recent call last):
  File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
    check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8', '--query', '\n             select user_id,\n                    convert(user_name using utf8mb4) user_name,\n                    user_name user_name_binary,\n                    convert(user_real_name using utf8mb4) user_real_name,\n                    convert(user_email using utf8mb4) user_email,\n                    convert(user_touched using utf8mb4) user_touched,\n                    convert(user_registration using utf8mb4) user_registration,\n                    user_editcount,\n                    convert(user_password_expires using utf8mb4) user_password_expires,\n                    user_is_temp\n\n               from user\n              where $CONDITIONS\n        ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbtestwiki', '--num-mappers', '16', '--as-avrodatafile', '--boundary-query', 'SELECT MIN(user_id), MAX(user_id) FROM user', '--split-by', 'user_id', '--class-name', 'user', '--jar-file', '/tmp/sqoop-jars/2025-07-03T11:39:03/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"']' returned non-zero exit status 1.

As the debug log says:

2025-07-03T11:39:10 DEBUG  You can copy the parameters above and execute the sqoop command manually

So I tried that. I have to remove the commas manually from everything except the two SQL statements.

btullis@an-launcher1002:~$ sudo -u analytics bash

analytics@an-launcher1002:/home/btullis$ 'sqoop' 'import' '-D' "mapred.job.name='sqoop-mediawiki-monthly-2025-06-testwiki.user'" '-D' 'mapreduce.job.queuename=production' '--username' 's53272' '--password-file' '/user/analytics/mysql-analytics-labsdb-client-pw.txt' '--connect' 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/testwiki_p?characterEncoding=UTF-8' '--query' '\n             select user_id,\n                    convert(user_name using utf8mb4) user_name,\n                    user_name user_name_binary,\n                    convert(user_real_name using utf8mb4) user_real_name,\n                    convert(user_email using utf8mb4) user_email,\n                    convert(user_touched using utf8mb4) user_touched,\n                    convert(user_registration using utf8mb4) user_registration,\n                    user_editcount,\n                    convert(user_password_expires using utf8mb4) user_password_expires,\n                    user_is_temp\n\n               from user\n              where $CONDITIONS\n        ' '--target-dir' '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbtestwiki' '--num-mappers' '16' '--as-avrodatafile' '--boundary-query' 'SELECT MIN(user_id), MAX(user_id) FROM user' '--split-by' 'user_id' '--class-name' 'user' '--jar-file' '/tmp/sqoop-jars/2025-07-03T11:39:03/mediawiki-tables-sqoop-orm.jar' '--map-column-java' '"user_id=Long,user_editcount=Long,user_is_temp=Boolean"'
Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
25/07/03 11:43:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
25/07/03 11:43:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
25/07/03 11:43:56 INFO tool.CodeGenTool: Using existing jar: /tmp/sqoop-jars/2025-07-03T11:39:03/mediawiki-tables-sqoop-orm.jar
25/07/03 11:43:56 INFO mapreduce.ImportJobBase: Beginning query import.
25/07/03 11:43:56 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
25/07/03 11:43:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
25/07/03 11:43:57 INFO manager.SqlManager: Executing SQL statement: \n             select user_id,\n                    convert(user_name using utf8mb4) user_name,\n                    user_name user_name_binary,\n                    convert(user_real_name using utf8mb4) user_real_name,\n                    convert(user_email using utf8mb4) user_email,\n                    convert(user_touched using utf8mb4) user_touched,\n                    convert(user_registration using utf8mb4) user_registration,\n                    user_editcount,\n                    convert(user_password_expires using utf8mb4) user_password_expires,\n                    user_is_temp\n\n               from user\n              where  (1 = 0) \n        
25/07/03 11:43:57 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\n             select user_id,\n                    convert(user_name using u...' at line 1
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\n             select user_id,\n                    convert(user_name using u...' at line 1
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:403)
	at com.mysql.jdbc.Util.getInstance(Util.java:386)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3933)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3869)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2675)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2468)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1915)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2023)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
	at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
	at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:61)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.generateAvroSchema(DataDrivenImportJob.java:133)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:90)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
	at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
25/07/03 11:43:57 INFO manager.SqlManager: Executing SQL statement: \n             select user_id,\n                    convert(user_name using utf8mb4) user_name,\n                    user_name user_name_binary,\n                    convert(user_real_name using utf8mb4) user_real_name,\n                    convert(user_email using utf8mb4) user_email,\n                    convert(user_touched using utf8mb4) user_touched,\n                    convert(user_registration using utf8mb4) user_registration,\n                    user_editcount,\n                    convert(user_password_expires using utf8mb4) user_password_expires,\n                    user_is_temp\n\n               from user\n              where  (1 = 0) \n        
25/07/03 11:43:57 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\n             select user_id,\n                    convert(user_name using u...' at line 1
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\n             select user_id,\n                    convert(user_name using u...' at line 1
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:403)
	at com.mysql.jdbc.Util.getInstance(Util.java:386)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3933)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3869)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2675)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2468)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1915)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2023)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
	at org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:132)
	at org.apache.sqoop.manager.SqlManager.getColumnNamesForQuery(SqlManager.java:123)
	at org.apache.sqoop.orm.ClassWriter.getColumnNames(ClassWriter.java:1807)
	at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:62)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.generateAvroSchema(DataDrivenImportJob.java:133)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:90)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
	at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
25/07/03 11:43:57 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:65)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.generateAvroSchema(DataDrivenImportJob.java:133)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:90)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
	at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
analytics@an-launcher1002:/home/btullis$

The key error seems to be:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\n             select user_id,\n                    convert(user_name using u...' at line 1

...but I don't know if this is my manipulation of the command line that is the problem, or if this is the real reason why sqoop is failing to work on this table of these two databases.

I looked into this with @JAllemandou and @mforns today, but we were unable to get to the bottom of it.
Investigation will have to continue tomorrow.

In the meantime, I'm going to start running the three remaining sqoop processes, starting with the following.

analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-production-history

This is in a screen session on an-launcher1002.

Tailing /var/log/refinery/sqoop-mediawiki-production.log to check on progress.

btullis@an-launcher1002:~$ tail -f /var/log/refinery/sqoop-mediawiki-production.log
2025-07-03T17:09:47 INFO   STARTING: etwiki.comment (try 1)
2025-07-03T17:09:52 INFO   FINISHED: etwiki.comment (try 1)
2025-07-03T17:09:52 INFO   FINISHED: etwiki.actor (try 1)
2025-07-03T17:09:52 INFO   ORM jar generated at /tmp/sqoop-jars/2025-07-03T17:09:47/mediawiki-tables-sqoop-orm.jar
2025-07-03T17:09:53 INFO   STARTING: enwiki.actor (try 1)
2025-07-03T17:09:53 INFO   Deleting temporary target directory /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawikiprivate/tables/actor/snapshot202506/wikidbenwiki if it exists
2025-07-03T17:11:06 INFO   Moving sqooped folder from /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawikiprivate/tables/actor/snapshot202506/wikidbenwiki to /wmf/data/raw/mediawiki_private/tables/actor/snapshot=2025-06/wiki_db=enwiki
2025-07-03T17:11:12 INFO   FINISHED: enwiki.actor (try 1)
2025-07-03T17:11:12 INFO   STARTING: enwiki.comment (try 1)
2025-07-03T17:11:12 INFO   Deleting temporary target directory /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawikiprivate/tables/comment/snapshot202506/wikidbenwiki if it exists

That finished successfully. Moving on to /usr/local/bin/refinery-sqoop-mediawiki-not-history

It looks like we got another error from etwiki.ipblocks during this run.

2025-07-04T08:26:27 ERROR  ERROR: etwiki.ipblocks (try 1)
Traceback (most recent call last):
  File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
    check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sqoop', 'codegen', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-etwiki.ipblocks'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3313/etwiki_p?characterEncoding=UTF-8', '--query', '\n             select ipb_id,\n                    convert(ipb_address using utf8mb4) ipb_address,\n                    ipb_user,\n                    null ipb_by,\n                    null ipb_by_text,\n                    null ipb_reason,\n                    convert(ipb_timestamp using utf8mb4) ipb_timestamp,\n                    ipb_auto,\n                    ipb_anon_only,\n                    ipb_create_account,\n                    ipb_enable_autoblock,\n                    convert(ipb_expiry using utf8mb4) ipb_expiry,\n                    convert(ipb_range_start using utf8mb4) ipb_range_start,\n                    convert(ipb_range_end using utf8mb4) ipb_range_end,\n                    ipb_deleted,\n                    ipb_block_email,\n                    ipb_allow_usertalk,\n                    ipb_parent_block_id,\n                    ipb_by_actor,\n                    ipb_reason_id\n\n               from ipblocks\n              where $CONDITIONS\n                \n         and 1=0', '--class-name', 'ipblocks', '--outdir', '/tmp/sqoop-jars/2025-07-04T08:26:22', '--bindir', '/tmp/sqoop-jars/2025-07-04T08:26:22', '--map-column-java', '"ipb_allow_usertalk=Boolean,ipb_anon_only=Boolean,ipb_auto=Boolean,ipb_block_email=Boolean,ipb_by=Long,ipb_by_actor=Long,ipb_by_text=String,ipb_create_account=Boolean,ipb_deleted=Boolean,ipb_enable_autoblock=Boolean,ipb_reason=String,ipb_reason_id=Long"']' returned non-zero exit status 1.

We have identified a link between this chain of errors and the work on T390767: Remove the compatibility layer of block schema in wikireplicas
Currently working on a fix with @JAllemandou .

Change #1166346 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Fix user and user_old views for WMCS

https://gerrit.wikimedia.org/r/1166346

Change #1166347 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Update analytics sqoop script tables

https://gerrit.wikimedia.org/r/1166347

Change #1166347 merged by Btullis:

[operations/puppet@production] Update analytics sqoop script tables

https://gerrit.wikimedia.org/r/1166347

Change #1166346 merged by Btullis:

[operations/puppet@production] Fix user and user_old views for WMCS

https://gerrit.wikimedia.org/r/1166346

Change #1166354 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Move sqooped ipblocks table to private

https://gerrit.wikimedia.org/r/1166354

Change #1166354 merged by Joal:

[analytics/refinery@master] Move sqooped ipblocks table to private

https://gerrit.wikimedia.org/r/1166354

Making progress now. @JAllemandou identified the issue with the user table and fixed it with: https://gerrit.wikimedia.org/r/1166346

The ipblocks table has also been removed from the wikireplicas, so we now sqoop this from the private replicas, instead.

In order to complete the sqoop run, I need to use my custom script to finish the commonswiki.user and testwiki.user tables.

btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics ./refinery-sqoop-mediawiki-history 
btullis@an-launcher1002:~$ echo $?
0

Here are the latest entries from the log.

2025-07-04T09:36:14 INFO   Moving sqooped folder from /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbtestwiki to /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=testwiki
2025-07-04T09:36:14 DEBUG  Running: hdfs dfs -ls -d /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun
2025-07-04T09:36:16 DEBUG  Running: hdfs dfs -mkdir -p /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun
2025-07-04T09:36:18 DEBUG  Running: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbtestwiki /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=testwiki
2025-07-04T09:36:21 INFO   FINISHED: testwiki.user (try 1)
2025-07-04T09:36:52 INFO   Moving sqooped folder from /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbcommonswiki to /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=commonswiki
2025-07-04T09:36:52 DEBUG  Running: hdfs dfs -ls -d /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun
2025-07-04T09:36:54 DEBUG  Running: hdfs dfs -mv /wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/user/snapshot202506rerun/wikidbcommonswiki /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=commonswiki
2025-07-04T09:36:56 INFO   FINISHED: commonswiki.user (try 1)
2025-07-04T09:36:58 INFO   Wrote Success file /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/_SUCCESS

Moving the data files into place.

analytics@an-launcher1002:/home/btullis$ hdfs dfs -mv /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=commonswiki /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06/
analytics@an-launcher1002:/home/btullis$ hdfs dfs -mv /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/wiki_db=testwiki /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06/

Creating the _SUCCESS file.

analytics@an-launcher1002:/home/btullis$ hdfs dfs -mv /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/_SUCCESS /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06/

Cleaning up the temporary directory.

analytics@an-launcher1002:/home/btullis$ hdfs dfs -rmdir /wmf/data/raw/mediawiki/tables/user/snapshot=2025-06-rerun/

Now continuing to run this in a screen session on an-launcher1002.

analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-not-history

This is still running, but I received one more error. It was only on try 1, so perhaps it completed on a second try.

analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-not-history
2025-07-05T04:15:56 ERROR  ERROR: frwiki.categorylinks (try 1)
Traceback (most recent call last):
  File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
    check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sqoop', 'import', '-D', "mapred.job.name='sqoop-mediawiki-monthly-2025-06-frwiki.categorylinks'", '-D', 'mapreduce.job.queuename=production', '--username', 's53272', '--password-file', '/user/analytics/mysql-analytics-labsdb-client-pw.txt', '--connect', 'jdbc:mysql://an-redacteddb1001.eqiad.wmnet:3316/frwiki_p?characterEncoding=UTF-8', '--query', '\n             select cl_from,\n                    convert(cl_to using utf8mb4) cl_to,\n                    convert(cl_sortkey using utf8mb4) cl_sortkey,\n                    convert(cl_sortkey_prefix using utf8mb4) cl_sortkey_prefix,\n                    convert(cl_timestamp using utf8mb4) cl_timestamp,\n                    convert(cl_collation using utf8mb4) cl_collation,\n                    convert(cl_type using utf8mb4) cl_type\n\n               from categorylinks\n              where $CONDITIONS\n                \n        ', '--target-dir', '/wmf/tmp/analytics/sqoop-mw/wmf/data/raw/mediawiki/tables/categorylinks/snapshot202506/wikidbfrwiki', '--num-mappers', '32', '--as-avrodatafile', '--boundary-query', '\n            SELECT MIN(cl_from),\n                   MAX(cl_from)\n              FROM categorylinks\n             WHERE TRUE\n                 \n        ', '--split-by', 'cl_from', '--class-name', 'categorylinks', '--jar-file', '/tmp/sqoop-jars/2025-07-04T10:01:15/mediawiki-tables-sqoop-orm.jar', '--map-column-java', '"cl_from=Long,cl_to=String,cl_sortkey=String,cl_sortkey_prefix=String,cl_timestamp=String,cl_collation=String,cl_type=String"']' returned non-zero exit status 1.

Continuing with: /usr/local/bin/refinery-sqoop-mediawiki-production-not-history

Oh, that quit out with another error about ipblocks

analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-production-not-history
2025-07-06T09:00:41 ERROR  ERROR: etwiki.ipblocks (try 1)
Traceback (most recent call last):
  File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 175, in sqoop_wiki
    check_call(sqoop_arguments, stdout=DEVNULL, stderr=DEVNULL)
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sqoop', 'codegen', '-D', "mapred.job.name='sqoop-mediawiki-monthly-production-not-history-2025-06-etwiki.ipblocks'", '-D', 'mapreduce.job.queuename=production', '--username', 'research', '--password-file', '/user/analytics/mysql-analytics-research-client-pw.txt', '--connect', 'jdbc:mysql://dbstore1007.eqiad.wmnet:3313/etwiki?characterEncoding=UTF-8', '--query', '\n             select ipb_id,\n                    convert(ipb_address using utf8mb4) ipb_address,\n                    ipb_user,\n                    null ipb_by,\n                    null ipb_by_text,\n                    null ipb_reason,\n                    convert(ipb_timestamp using utf8mb4) ipb_timestamp,\n                    ipb_auto,\n                    ipb_anon_only,\n                    ipb_create_account,\n                    ipb_enable_autoblock,\n                    convert(ipb_expiry using utf8mb4) ipb_expiry,\n                    convert(ipb_range_start using utf8mb4) ipb_range_start,\n                    convert(ipb_range_end using utf8mb4) ipb_range_end,\n                    ipb_deleted,\n                    ipb_block_email,\n                    ipb_allow_usertalk,\n                    ipb_parent_block_id,\n                    ipb_by_actor,\n                    ipb_reason_id\n\n               from ipblocks\n              where $CONDITIONS\n                \n         and 1=0', '--class-name', 'ipblocks', '--outdir', '/tmp/sqoop-jars/2025-07-06T09:00:37', '--bindir', '/tmp/sqoop-jars/2025-07-06T09:00:37', '--map-column-java', '"ipb_allow_usertalk=Boolean,ipb_anon_only=Boolean,ipb_auto=Boolean,ipb_block_email=Boolean,ipb_by=Long,ipb_by_actor=Long,ipb_by_text=String,ipb_create_account=Boolean,ipb_deleted=Boolean,ipb_enable_autoblock=Boolean,ipb_reason=String,ipb_reason_id=Long"']' returned non-zero exit status 1.
2025-07-06T09:00:43 ERROR  ERROR generating ORM jar for ipblocks

I will look into this tomorrow.

I manually executed the command and the error is: Table 'etwiki.ipblocks' doesn't exist

analytics@an-launcher1002:/home/btullis$ 'sqoop' 'codegen' '-D' "mapred.job.name='sqoop-mediawiki-monthly-production-not-history-2025-06-etwiki.ipblocks'" '-D' 'mapreduce.job.queuename=production' '--username' '
research' '--password-file' '/user/analytics/mysql-analytics-research-client-pw.txt' '--connect' 'jdbc:mysql://dbstore1007.eqiad.wmnet:3313/etwiki?characterEncoding=UTF-8' '--query' 'select ipb_id, convert(ipb_a
ddress using utf8mb4) ipb_address, ipb_user, null ipb_by, null ipb_by_text, null ipb_reason, convert(ipb_timestamp using utf8mb4) ipb_timestamp, ipb_auto, ipb_anon_only, ipb_create_account, ipb_enable_autoblock,
 convert(ipb_expiry using utf8mb4) ipb_expiry, convert(ipb_range_start using utf8mb4) ipb_range_start, convert(ipb_range_end using utf8mb4) ipb_range_end, ipb_deleted, ipb_block_email, ipb_allow_usertalk, ipb_pa
rent_block_id, ipb_by_actor, ipb_reason_id from ipblocks where $CONDITIONS and 1=0' '--class-name' 'ipblocks' '--outdir' '/tmp/sqoop-jars/2025-07-06T09:00:37' '--bindir' '/tmp/sqoop-jars/2025-07-06T09:00:37' '--
map-column-java' '"ipb_allow_usertalk=Boolean,ipb_anon_only=Boolean,ipb_auto=Boolean,ipb_block_email=Boolean,ipb_by=Long,ipb_by_actor=Long,ipb_by_text=String,ipb_create_account=Boolean,ipb_deleted=Boolean,ipb_en
able_autoblock=Boolean,ipb_reason=String,ipb_reason_id=Long"'
Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
25/07/07 10:45:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
25/07/07 10:46:00 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
25/07/07 10:46:00 INFO tool.CodeGenTool: Beginning code generation
25/07/07 10:46:00 INFO manager.SqlManager: Executing SQL statement: select ipb_id, convert(ipb_address using utf8mb4) ipb_address, ipb_user, null ipb_by, null ipb_by_text, null ipb_reason, convert(ipb_timestamp using utf8mb4) ipb_timestamp, ipb_auto, ipb_anon_only, ipb_create_account, ipb_enable_autoblock, convert(ipb_expiry using utf8mb4) ipb_expiry, convert(ipb_range_start using utf8mb4) ipb_range_start, convert(ipb_range_end using utf8mb4) ipb_range_end, ipb_deleted, ipb_block_email, ipb_allow_usertalk, ipb_parent_block_id, ipb_by_actor, ipb_reason_id from ipblocks where  (1 = 0)  and 1=0
25/07/07 10:46:00 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'etwiki.ipblocks' doesn't exist

We recently merged this patch, which moves the sqooping of the ipblocks table from the cloud replicas to the private replicas.
Update analytics sqoop script tables

...but it would seem that the ipblocks tables simply doesn't exist any more.

btullis@stat1008:~$ analytics-mysql etwiki

mysql:research@dbstore1007.eqiad.wmnet [etwiki]> show tables like '%ipblock%';
+------------------------------+
| Tables_in_etwiki (%ipblock%) |
+------------------------------+
| ipblocks_restrictions        |
+------------------------------+
1 row in set (0.001 sec)

It looks like the ipblocks table was completely dropped in T367632: Drop ipblocks in production.

Change #1166796 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop sqooping the ipblocks table, since it no longer exists.

https://gerrit.wikimedia.org/r/1166796

Change #1166796 merged by Btullis:

[operations/puppet@production] Stop sqooping the ipblocks table, since it no longer exists.

https://gerrit.wikimedia.org/r/1166796

Now executing /usr/local/bin/refinery-sqoop-mediawiki-production-not-history again on an-launcher1002.

This has now finished, so I think that means the whole sqoop process has finished successfully.

analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-production-not-history
analytics@an-launcher1002:/home/btullis$ echo $?
0

Resetting the failed systemd service.

btullis@an-launcher1002:~$ systemctl --failed                                                                                                                                                                        UNIT                                   LOAD   ACTIVE SUB    DESCRIPTION
● refinery-sqoop-whole-mediawiki.service loaded failed failed Schedules sqoop to import whole MediaWiki databases into Hadoop monthly.

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
btullis@an-launcher1002:~$ sudo systemctl reset-failed refinery-sqoop-whole-mediawiki.service
btullis@an-launcher1002:~$ systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
btullis@an-launcher1002:~$