Page MenuHomePhabricator

castor does not restore caches?
Closed, ResolvedPublic

Assigned To
Authored By
hashar
Wed, Jun 26, 3:34 PM
Referenced Files
None
Subscribers
Tokens
"Orange Medal" token, awarded by dancy."Orange Medal" token, awarded by Krinkle."Burninate" token, awarded by Jdforrester-WMF.

Description

While checking T362425 I found the composer installation of dev dependencies downloads artifacts:

11:31:10   - Downloading squizlabs/php_codesniffer (3.8.1)
11:31:10   - Downloading dealerdirect/phpcodesniffer-composer-installer (v1.0.0)
11:31:10   - Downloading composer/pcre (3.1.4)
11:31:10   - Downloading psr/cache (1.0.1)
11:31:10   - Downloading doctrine/deprecations (1.1.3)
11:31:10   - Downloading doctrine/event-manager (1.2.0)

As if there was no cache at all... Fun times.

Repro:

$ ssh integration-agent-docker-1042
$ mkdir cache
$ sudo docker run --rm -it -e CASTOR_HOST=integration-castor05.integration.eqiad1.wikimedia.cloud -e JOB_NAME=mediawiki-quibble-vendor-mysql-php74 -e ZUUL_PROJECT=mediawiki/core -e ZUUL_BRANCH=master -e CASTOR_NAMESPACE="mediawiki-core/master/mediawiki-quibble-vendor-mysql-php74" -v cache:/cache docker-registry.wikimedia.org/releng/castor:0.3.0 load
Defined: CASTOR_NAMESPACE="mediawiki-core/master/mediawiki-quibble-vendor-mysql-php74"
Syncing...
rsync: [Receiver] mkdir "/nonexistent" failed: Permission denied (13)
rsync error: error in file IO (code 11) at main.c(787) [Receiver=3.2.3]

Done

Event Timeline

hashar triaged this task as Unbreak Now! priority.Wed, Jun 26, 3:37 PM

The rsync fails:

sudo docker run --rm -it -e CASTOR_HOST=integration-castor05.integration.eqiad1.wikimedia.cloud -e JOB_NAME=mediawiki-quibble-vendor-mysql-php74 -e ZUUL_PROJECT=mediawiki/core -e ZUUL_BRANCH=master -e CASTOR_NAMESPACE="mediawiki-core/master/mediawiki-quibble-vendor-mysql-php74" -v cache:/cache docker-registry.wikimedia.org/releng/castor:0.3.0 load
Defined: CASTOR_NAMESPACE="mediawiki-core/master/mediawiki-quibble-vendor-mysql-php74"
Syncing...
rsync: [Receiver] mkdir "/nonexistent" failed: Permission denied (13)
rsync error: error in file IO (code 11) at main.c(787) [Receiver=3.2.3]

Done

That is on the receiver side since /nonexistent does not exist. That is the HOME of the user. The problem is the castor-load-sync.bash relies on the job name containing docker:

[[ $JOB_NAME == *'docker'* ]] && is_docker=1 || is_docker=''

And depending on that would rsync either to /cache (in the container) or $HOME which was used for Nodepool (which we phased out back in 2018?).

Change #1049976 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: castor: always restore to /cache

https://gerrit.wikimedia.org/r/1049976

Change #1049981 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to releng/castor:0.4.0 image

https://gerrit.wikimedia.org/r/1049981

Change #1049976 merged by jenkins-bot:

[integration/config@master] dockerfiles: castor: always restore to /cache

https://gerrit.wikimedia.org/r/1049976

Change #1049981 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to releng/castor:0.4.0 image

https://gerrit.wikimedia.org/r/1049981

Mentioned in SAL (#wikimedia-releng) [2024-06-26T16:03:19Z] <hashar> Updating all jobs to switch to releng/castor:0.4.0 and fix cache restoration # T368550

hashar updated the task description. (Show Details)

I have picked a build of https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium/ assuming the cache got saved, and surely:

16:16:16 + exec docker run --volume /srv/jenkins/workspace/quibble-vendor-mysql-php74-noselenium/cache:/cache --security-opt seccomp=unconfined --init --rm --label jenkins.job=quibble-vendor-mysql-php74-noselenium --label jenkins.build=22689 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/castor:0.4.0 load
16:16:16 ++ /usr/bin/env
16:16:16 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL)='
16:16:16 Defined: CASTOR_NAMESPACE="castor-mw-ext-and-skins/master/quibble-vendor-mysql-php74-noselenium"
16:16:16 Syncing...
16:16:17 rsync: [generator] failed to set times on "/cache/.": Operation not permitted (1)
16:16:17 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [generator=3.2.3]
16:16:17 
16:16:17 Done

And composer has:

16:17:43 [12.7MiB/0.54s]   - Installing squizlabs/php_codesniffer (3.8.1): Extracting archive
16:17:43 [12.9MiB/0.73s]   - Installing dealerdirect/phpcodesniffer-composer-installer (v1.0.0): Extracting archive
16:17:43 [13.1MiB/0.76s]   - Installing composer/pcre (3.1.4): Extracting archive
16:17:43 [13.1MiB/0.76s]   - Installing symfony/polyfill-php80 (v1.30.0): Extracting archive
16:17:43 [13.1MiB/0.76s]   - Installing phpcsstandards/phpcsutils (1.0.9): Extracting archive
16:17:43 [13.2MiB/0.77s]   - Installing phpcsstandards/phpcsextra (1.1.2): Extracting archive
16:17:43 [13.2MiB/0.77s]   - Installing symfony/polyfill-mbstring (v1.30.0): Extracting archive

So that is fixed!

Mentioned in SAL (#wikimedia-releng) [2024-06-26T16:54:56Z] <hashar> integration: fixed castor cache restoration which was broken since mid may # T368550