Page MenuHomePhabricator

The service unit analytics-dumps-fetch-pageview_complete_dumps.service is in failed status on host clouddumps1002.
Closed, DuplicatePublic

Description

root@clouddumps1002:/usr# cat lib/systemd/system/analytics-dumps-fetch-pageview_complete_dumps.service
[Unit]
Description=Copy pageview_complete_dumps files from Hadoop HDFS.

[Service]
User=dumpsgen
SyslogIdentifier=kerberos-run-command
ExecStart=/usr/local/bin/systemd-timer-mail-wrapper -T data-engineering-alerts@lists.wikimedia.org --only-on-error /usr/local/bin/kerberos-run-command dumpsgen /usr/local/bin/rsync-analytics-pageview_complete_dumps

Event Timeline

root@clouddumps1002:/usr# systemctl status analytics-dumps-fetch-pageview_complete_dumps.service
● analytics-dumps-fetch-pageview_complete_dumps.service - Copy pageview_complete_dumps files from Hadoop HDFS.
     Loaded: loaded (/lib/systemd/system/analytics-dumps-fetch-pageview_complete_dumps.service; static)
     Active: failed (Result: exit-code) since Thu 2022-11-03 05:00:40 UTC; 8h ago
TriggeredBy: ● analytics-dumps-fetch-pageview_complete_dumps.timer
    Process: 296219 ExecStart=/usr/local/bin/systemd-timer-mail-wrapper -T data-engineering-alerts@lists.wikimedia.org --only-on-error /usr/local/bin/kerberos-run-command dumpsgen>
   Main PID: 296219 (code=exited, status=1/FAILURE)
        CPU: 57.019s

Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at com.sun.proxy.$Proxy11.getListing(Unknown Source)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1661)
Nov 03 05:00:40 clouddumps1002 kerberos-run-command[296219]:         ... 41 more
Nov 03 05:00:40 clouddumps1002 systemd[1]: analytics-dumps-fetch-pageview_complete_dumps.service: Main process exited, code=exited, status=1/FAILURE
Nov 03 05:00:40 clouddumps1002 systemd[1]: analytics-dumps-fetch-pageview_complete_dumps.service: Failed with result 'exit-code'.
Nov 03 05:00:40 clouddumps1002 systemd[1]: analytics-dumps-fetch-pageview_complete_dumps.service: Consumed 57.019s CPU time.

Andrew asked if I might know someone who knows something about this. I've never touched the kerb or hdfs stuff, but @elukey worked with the kerberos stuff in the modules/dumps/manifests/web/fetches/stats.pp manifest at one point, and @BTullis built a bunch of related packages in T310643 and so maybe knows something too.

I will certainly take a look at this.

I believe that this has now been fixed. @Antoine_Quhen also raised a ticket about the matter and I have fixed the permissions and subsequently restarted the service in T322394: Update ownership of manually generated files on clouddumps1002

image.png (252×1 px, 105 KB)