Page MenuHomePhabricator

[opsweek] Airflow DAGs with Spark jobs should always include Spark tuning variables
Open, Needs TriagePublic

Description

While doing opsweek, twice I found myself in a position that a Spark job was OOMing, yet the controlling Airflow job had no varprops (or DagProperties) for me to be able to easily tune the job. ( See here and here )

In this task we should:

  • Identify which DAGs are missing the following Spark tunings:
driver_memory
driver_cores
executor_memory
executor_cores
  • Modify the dags so that these tunings are available.

Side note: It seems to me like the best time to do this would be as part of T336738, but leaving this as a separate ticket as this issue affects the opsweek sanity.

Event Timeline

xcollazo updated the task description. (Show Details)

This is the list of DAGs that seem to be missing executor_memory (which I believe to be the most important Spark tuning config):

√ dags % pwd
/Users/xcollazo/wmf/gitlab/airflow-dags/analytics/dags
√ dags % grep -riL "executor_memory" . | grep -v pyc | grep -v DS_Store
./mediacounts/mediacounts_archive_daily_dag.py
./mediacounts/mediacounts_load_hourly_dag.py
./anomaly_detection/anomaly_detection_mobile_os_distribution_daily_dag.py
./anomaly_detection/anomaly_detection_traffic_distribution_daily_dag.py
./anomaly_detection/anomaly_detection_useragent_distribution_daily_dag.py
./hdfs_usage/hdfs_usage_weekly_dag.py
./virtualpageview/virtualpageview_hourly_dag.py
./apis/apis_metrics_to_graphite_hourly_dag.py
./browser_general/browser_general_daily_dag.py
./pageview/pageview_allowlist_check_dag.py
./interlanguage/interlanguage_daily_dag.py
./aqs/aqs_hourly_dag.py
./projectview/projectview_hourly_dag.py
./projectview/projectview_geo_dag.py
./geoeditors/unique_editors_by_country_monthly_dag.py
./geoeditors/editors_daily_monthly_dag.py
./geoeditors/geoeditors_public_monthly_dag.py
./geoeditors/geoeditors_edits_monthly_dag.py
./geoeditors/geoeditors_yearly_dag.py
./geoeditors/geoeditors_monthly_dag.py
./mediarequest/mediarequest_hourly_dag.py
./datahub/ingestion/configs/hive_event.yaml
./datahub/ingestion/configs/druid_public.yaml
./datahub/ingestion/configs/hive_wmf_traffic.yaml
./datahub/ingestion/configs/kafka_jumbo.yaml
./datahub/ingestion/configs/hive_differential_privacy.yaml
./datahub/ingestion/configs/hive_wmf.yaml
./datahub/ingestion/configs/hive_wmf_product.yaml
./datahub/ingestion/configs/druid_internal.yaml
./datahub/ingestion/configs/hive_gdi.yaml
./datahub/ingestion/configs/hive_knowledge_gaps.yaml
./datahub/ingestion/configs/hive_event_sanitized.yaml
./datahub/ingestion/configs/hive_canonical_data.yaml
./datahub/ingestion/configs/hive_wmf_raw.yaml
./datahub/ingestion/ingest_daily_dag.py
./druid_load/druid_load_unique_devices_per_domain_monthly_dag.py
./druid_load/druid_load_unique_devices_per_domain_daily_dag.py
./druid_load/druid_load_editattemptstep_dag.py
./druid_load/druid_load_navigationtiming_dag.py
./druid_load/druid_load_unique_devices_per_project_family_daily_dag.py
./druid_load/druid_load_prefupdate_dag.py
./druid_load/druid_load_geoeditors_monthly_dag.py
./druid_load/druid_load_unique_devices_per_project_family_monthly_dag.py
./druid_load/druid_load_virtualpageview_dag.py
./gdi/equity_landscape/equity_landscape_app_dag.py
./gdi/equity_landscape/equity_landscape_hql_dag.py
./gdi/equity_landscape/equity_landscape_api_dag.py
./gdi/equity_landscape/equity_landscape_csv_dag.py
./session_length/session_length_daily_dag.py
./wikidata/wikidata_metrics_to_graphite_daily_dag.py
./wikidata/wikidata_coeditors_metrics_to_graphite_monthly_dag.py
./webrequest/refine_webrequest_hourly_dag.py
./mediawiki/mediawiki_history_load_dag.py
./cassandra_load/cassandra_load_unique_devices_dag.py
./cassandra_load/cassandra_load_pageview_per_project_dag.py
./cassandra_load/cassandra_load_editors_by_country_dag.py
./referrer/referrer_daily_dag.py
√ dags %