Maniphest T307447

Adapt maxExecutors value by Dag
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Antoine_Quhen
	May 3 2022, 11:41 AM

Tags

Referenced Files

None

Subscribers

Tokens

"Mountain of Wealth" token, awarded by JAllemandou.

Description

Currently some jobs are too limited by spark.dynamicAllocation.maxExecutors=16:

the spark task in aqs hourly is taking ~1.5min, but it took ~30s in Hive
mediarequest hourly is taking 3.5min
app_session_metrics is taking an hour. And this may be the reason why the skein log collector is crashing. The current fix is here: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/9fa5d7e003c86785ba5149642eaec9a0d5bee596

The maxExecutors configuration is located here:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/config/dag_default_args.py#L100
... is well propagated to Skein. But we should adapt it to each dag.

Concerning the 3 jobs upper, we could set the value to 64. Lets review the others.

Event Timeline

Antoine_Quhen created this task.May 3 2022, 11:41 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2022, 11:41 AM

JAllemandou awarded a token.May 3 2022, 11:55 AM

JAllemandou subscribed.

JArguello-WMF moved this task from Incoming (new tickets) to Transform on the Data-Engineering board.May 11 2022, 6:23 PM

JAllemandou claimed this task.May 13 2022, 1:05 PM

JAllemandou moved this task from Backlog to Estimated on the Data Pipelines board.

JAllemandou moved this task from Next Up to Done on the Data-Engineering-Kanban board.May 19 2022, 5:10 PM

JAllemandou moved this task from Estimated to Done on the Data Pipelines board.May 23 2022, 3:42 PM

JArguello-WMF closed this task as Resolved.May 31 2022, 3:24 PM