Page MenuHomePhabricator

Castor rsync causes: rsync: failed to set times on "/cache/.": Operation not permitted (1)
Open, LowPublic

Description

When Castor restore the cache from the central cache on the Jenkins slave, it spurts a permission issue when setting times for /cache/.. It is harmless since we ignore all errors, but that is a bit distracting:

Syncing...
rsync: failed to set times on "/cache/.": Operation not permitted (1)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2] 

Done

The reason is the cache directory is created by the jenkins user:

mkdir -m 2777 -p cache

But the Castor rsync is done in a container with the nobody user:

docker run -v "$(pwd)/cache:/cache"  docker-registry.wikimedia.org/releng/castor:0.1.3 load

Hence the directories we create (cache, log, src) should be created by the nobody user.

Event Timeline

I had a similar issue when simplifying the way we publish documentation ( e9de35e88f7a01e6abb72bde53a5c9d316838276 ). The issue is rsync --archive attempts to preserve time and permissions, hence why we get the warning. From the fix I did for doc:

# We do not --archive which preserves date, time or permissions.
# The base directory (such as ./cover) might have been populated by
# puppet and thus owned by a different user than rsyncd.
rsync --verbose --recursive --relative "${WMF_CI_PUB_DEST}" rsync://doc1001.eqiad.wmnet/doc/

Or in short, we need to drop --archive from the castor load command.

From another comment I made:

The cache is populated by rsync which is run inside a container and uses the 'nobody' user. However the /cache/ directory is owned by root iirc and thus rsync can not update the timestamps on the directory. Hence:

rsync: failed to set times on "/cache/.": Operation not permitted (1)

That is a non fatal error and can be ignored. It is due to rsync using --archive which implies --times and that option is used to optimize the transfer by comparing timestamps and avoid rsyncing files that have not been changed. Though, the cache is initially empty so we could probably drop that --times option. There is also --omit-dir-times

Hello, this issue seems to be blocking Wikibase CI now: https://integration.wikimedia.org/ci/job/mwgate-node18-docker/3936/console

This issue T188488 is about rsync spurting an error message which is otherwise ignored. The npm cache being corrupted is T295351 and I will clean it up:)