The commonswiki_file.py and cassandra.py Spark tasks fail in Airflow, while they execute fine on Analytics clients: containers get killed due to memory limits.
commonswiki_file.py
Execution log excerpt:
[2022-04-29 15:27:04,818] {subprocess.py:78} INFO - diagnostics: Application application_1650893206204_20962 failed 6 times due to AM Container for appattempt_1650893206204_20962_000006 exited with exitCode: -104 [2022-04-29 15:27:04,819] {subprocess.py:78} INFO - Failing this attempt.Diagnostics: [2022-04-29 15:27:04.067]Container [pid=8628,containerID=container_e36_1650893206204_20962_06_000001] is running beyond physical memory limits. Current usage: 2.4 GB of 2.4 GB physical memory use d; 23.0 GB of 5.0 GB virtual memory used. Killing container.
More details at an-airflow1003.eqiad.wmnet:/home/mfossati/commonswiki_file_failure.log.
This results in several broken delta parquets (i.e., directories with one _temporary file), whle lead_image_data_latest and wikidata_data_latest look fine (_SUCCESS file + snappy ones are there, quickly checked the Spark DataFrame with count()and show()).
cassandra.py
Execution log excerpt:
diagnostics: Application application_1650893206204_21375 failed 6 times due to AM Container for appattempt_1650893206204_21375_000006 exited with exitCode: -104 Failing this attempt.Diagnostics: [2022-04-29 17:55:34.282]Container [pid=39484,containerID=container_e36_1650893206204_21375_06_000002] is running beyond physical memory limits. Current usage: 2.4 GB of 2.4 GB physical memory used; 6.1 GB of 5.0 GB virtual memory used. Killing container.
More details at an-airflow1003.eqiad.wmnet:/home/mfossati/cassandra_failure.log.
Only the analytics_platform_eng.suggestions Hive table seems present.