We've been provisioning new bullseye wdqs hosts and ran into some issues with scap; namely "incomplete" deploys that leave the host in a non-functional state.
Scap works as we expect when we run with the --force flag:
scap deploy --force -l 'wdqs2016.codfw.wmnet' ${VERSION}
However if we run it without the --force flag, and the directory /srv/deployment/wdqs/wdqs-cache/revs/$CURRENT_DEPLOY_COMMIT_HASH/.git/config-files/etc/query_service is not present, scap will silently "fail" the deploy but not raise an error:
Here's a log from wdqs2016 where its .git/config-files/etc/query_service directory has been manually deleted by the operator:
https://phabricator.wikimedia.org/P49572
And here's a log from the same host, same directory missing, but with the --force flag:
https://phabricator.wikimedia.org/P49573
I think based off this log message:
{"name": "target.wdqs2016.codfw.wmnet.deploy-local", "msg": "Revision directory already exists (use --force to override)", "args": [], "levelno": 20, "filename": "deploy.py", "exc_text": null, "lineno": 336, "funcName": "fetch", "created": 1689706482.8972232, "msecs": 897.2232341766357, "relativeCreated": 743.9661026000977, "host": "wdqs2016.codfw.wmnet"}
The issue is possibly that scap is happy as long as /srv/deployment/wdqs/wdqs-cache/revs/$CURRENT_DEPLOY_COMMIT_HASH/ exists, but doesn't check that all relevant subdirectories are as they should.
AC
- When necessary subdirectories of the main git directory are not present and/or have different contents than expected, scap will actually raise an error when the --force flag is not present