Page MenuHomePhabricator

Crash of artifact-cache in scap deploy context
Closed, ResolvedPublic


I wanted to deploy some new code of airflow-dags/analytics. The config/artifacts.yaml contained 1 more artifact. And the deployment crashed with an AttributeError.

aqu@deploy1002:/srv/deployment/airflow-dags/analytics$ scap deploy "T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@$(git rev-parse --short HEAD)]"
12:47:49 Started deploy [airflow-dags/analytics@cae0024]
12:47:49 Deploying Rev: HEAD = cae0024bdf0f517c0c2e4384705a76ccfc787293
12:47:49 Started deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024]
:* an-launcher1002.eqiad.wmnet
airflow-dags/analytics: fetch stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0)
airflow-dags/analytics: config_deploy stage(s): 100% (in-flight: 0; ok: 1; fail: 0; left: 0)
12:47:57 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'airflow-dags/analytics', '-g', 'default', 'promote', '--refresh-config'] (ran as analytics@an-launcher1002.eqiad.wmnet
) returned [1]: Could not chdir to home directory /nonexistent: No such file or directory
Executing check 'artifacts_sync'
Check 'artifacts_sync' failed: Traceback (most recent call last):
  File "/usr/lib/airflow/bin/artifact-cache", line 8, in <module>
  File "/usr/lib/airflow/lib/python3.7/site-packages/workflow_utils/artifact/", line 30, in main
  File "/usr/lib/airflow/lib/python3.7/site-packages/workflow_utils/artifact/", line 65, in cache_put
    cache.put(,, force=force)
  File "/usr/lib/airflow/lib/python3.7/site-packages/workflow_utils/artifact/", line 113, in put
    with as output:
  File "/usr/lib/airflow/lib/python3.7/site-packages/workflow_utils/artifact/", line 108, in open
    return, mode='wb').open()
  File "/usr/lib/airflow/lib/python3.7/site-packages/fsspec/", line 150, in open
    out.close = close
AttributeError: can't set attribute

... Then rollback

Later, running the code directly, it worked:

aqu@an-launcher1002:/srv/deployment/airflow-dags/analytics$ sudo -u analytics         /usr/local/bin/kerberos-run-command analytics         /usr/lib/airflow/bin/artifact-cache
      warm         /srv/deployment/airflow-dags/analytics/wmf_airflow_common/config/artifact_config.yaml         /srv/deployment/airflow-dags/analytics/analytics/config/artifacts.yaml
        hdfs:///wmf/cache/artifacts/airflow/ (exists=True)       (exists=True)
        hdfs:///wmf/cache/artifacts/airflow/ (exists=True)       (exists=True)
        hdfs:///wmf/cache/artifacts/airflow/       (exists=True)    (exists=True)

The next scap deploy worked.

Event Timeline

Same error today, but I may have found a pattern:

  1. scap deploy some code with a newly declared artifact
  2. crash with the same error as described on 4/11
  3. scap rollback
  4. scap deploy again
  5. now it works

Bty, I noticed that workflow_utils was not up to date in an-launcher1002.eqiad.wmnet /usr/lib/airflow/lib/python3.7/site-packages/workflow_utils.

How to reproduce manually and currently, on an-launcher1002:

hdfs dfs -rm  /wmf/cache/artifacts/airflow/

# 2 times:
/usr/lib/airflow/bin/artifact-cache warm \
  /srv/deployment/airflow-dags/analytics/wmf_airflow_common/config/artifact_config.yaml \

Seems to be a bug with fsspec + the new pyarrow API. I think we have to go back to not using the new pyarrow API for now. We can just avoid calling fsspec_use_new_pyarrow_api in the artifacts-cache script. Will make a patch.

Okay, fixed and deployed. All artifacts should be synced now.

The fixes and improvements are in this MR: