Page MenuHomePhabricator

airflow instances should use specific artifact cache directories
Closed, ResolvedPublic

Description

In airflow-dags/wmf_airflow_common/config/artifact_config.yaml, we declare a single HDFS based artifact cache and set it as the default.

This means that each airflow-dags/<instance>/config/artifact.config file will use the same cache directory. If multiple instances declare the same artifact, they will clobber each other's artifact cache and scap deployments will fail.

This MR does this for the new platform_eng instance, but we needed to add some extra hacks to keep the globally declared cache from being used.

Event Timeline

xcollazo changed the task status from Open to In Progress.Aug 22 2022, 6:28 PM

Working on this one as part of T315633.

scap deploy successfully to platform_eng, analytics_test and analytics instances.

The research instance is tracking a different branch than main so I didn't want to mess with that. Owner can deploy later. CC @bmansurov

@xcollazo thanks for the ping. If you mean the research instance on deploy1002, then I've pulled your changes, rebased on main and deployed. If you mean something else, please let me know. Thanks!

@bmansurov : right, you'd use the deploy server (deploy1002) to deploy into https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow#research (an-airflow1002.eqiad.wmnet).

I can see that the changes are now live since the research instance now has its own cache folder:

xcollazo@stat1007:~$ hdfs dfs -ls /wmf/cache/artifacts/airflow/research
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 2 items
-rw-r-----   3 analytics-research analytics-privatedata-users  386934141 2022-09-08 23:56 /wmf/cache/artifacts/airflow/research/article-quality-0.0.2.conda.tgz
-rw-r-----   3 analytics-research analytics-privatedata-users  425884197 2022-09-08 23:55 /wmf/cache/artifacts/airflow/research/knowledge-gaps-0.1.3.conda.tgz

We're good.