On T342911, we were trying to debug a long running (3+ days) Yarn job submitted via Airflow that had run on 2023-06-24, so about a month old. Unfortunately, all the Yarn logs were purged:
xcollazo@an-launcher1002:~$ date Thu 27 Jul 2023 08:20:46 PM UTC xcollazo@an-launcher1002:~$ sudo -u analytics yarn logs -appOwner analytics -applicationId application_1686833367123_9750 Unable to get ApplicationState. Attempting to fetch logs directly from the filesystem. File /var/log/hadoop-yarn/apps/analytics/logs/application_1686833367123_9750 does not exist. Can not find any log file matching the pattern: [ALL] for the application: application_1686833367123_9750 Can not find the logs for the application: application_1686833367123_9750 with the appOwner: analytics
This makes debugging of older jobs impossible. See T342911 to see why we were interested in such an old job.
In this task we should consider bumping the retention period. I suggest 3 months.