During the development of the new Refine DAG, Skein log collection was activated to facilitate faster development by making meaningful information easily accessible through the Airflow UI. However, it had to be disabled due to disk constraints on an-launcher1002.
Currently, when Skein log collection is activated, it retrieves logs only for the Skein application, which primarily consists of Spark Driver logs. These logs are often cluttered with verbose Spark internal lines, making them difficult to navigate and unnecessarily consuming disk space.
Proposed Solution:
• Introduce a parameter to the SparkSubmitOperator that reduces verbosity in the Spark Driver logs.
• Configure this parameter to suppress internal Spark log lines while retaining meaningful information for debugging and monitoring.
Benefits:
• Reduced size of collected logs, mitigating disk usage issues.
• Enhanced clarity in log content, making it easier to debug and monitor the Refine DAG.
• Re-enablement of Skein log collection without compromising system resources.