Page MenuHomePhabricator

Airflow DAG (hdfs_usage_weekly) failed with no details in the application log
Closed, ResolvedPublic

Description

This week, as an Ops week alert, we received the following one:

Airflow alert: <TaskInstance: hdfs_usage_weekly.aggregate_and_extract_fsimage_data_to_parquet scheduled__2024-04-29T00:00:00+00:00 [failed]

Try 6 out of 6
Exception:
application_1713453355160_190101 is not running. Application state: FAILED
Log: Link
Host: an-launcher1002.eqiad.wmnet
Mark success: Link

I took a look at the application log and I only found the following:

24/05/06 06:38:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
24/05/06 06:38:07 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e121_1713453355160_192254_01_000001 on an-worker1112.eqiad.wmnet_8041_1714977369600
LogAggregationType: AGGREGATED
========================================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Mon May 06 06:36:09 +0000 2024
LogLength:184
LogContents:
2024-05-06 06:36:07,617 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
End of LogType:container-localizer-syslog
*******************************************************************************************

There is no specific details about the error but the DAG tried 6 out of 6. I retried one more time but it failed again with the same information.

Event Timeline

The issue was in the path to the configured log4j.properties file in Airflow UI, hdfs:///user/aqu/aqu-log4j.properties was not accessible by the Airflow user analytics.

I've copied the file to a new path and changed the path to hdfs:///user/analytics/aqu-log4j.properties in the Airflow UI for a quick fix so the task is running now, however I'm preparing a quick MR where I updated the DAG script to use the new DagProperties object and make configuration a bit easier.

What's the longer-term location for the log4j properties file name? Presumably we don't want to leave the file name as aqu-log4j.properties within any folder?

What's the longer-term location for the log4j properties file name? Presumably we don't want to leave the file name as aqu-log4j.properties within any folder?

Longer-term we will remove the log4j properties file reference - it's supposed to be used for debugging purposes only. The MR I referenced in this ticket achieves this, it sets the properties file to None and only uses the override in Airflow UI if any.

This task can be closed as the issue has been fixed and changes to the DAG have been merged.