Page MenuHomePhabricator

operations-puppet Docker container takes a while to build
Closed, ResolvedPublic

Description

On contint1001, when building the operations-puppet Docker image, it takes a while on:

Step 12/16 : RUN chown -R nobody /srv/workspace/.cache && mkdir -p /tmp/cache && mv /srv/workspace/.cache/puppet /tmp/cache/puppet && mkdir -p /tmp/cache/puppet/.bundle && mv /srv/workspace/.cache/bundle-config /tmp/cache/puppet/.bundle/config && mv /srv/workspace/.cache/tox /tmp/cache/puppet/.tox

The reason seems to be chown -R nobody /srv/workspace/.cache. Via strace:

newfstatat(4</srv/workspace/.cache/puppet/modules/role/templates>, "haproxy", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(4</srv/workspace/.cache/puppet/modules/role/templates>, "haproxy", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>
fcntl(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_GETFD) = 0
fcntl(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_SETFD, FD_CLOEXEC) = 0
fstat(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)
fcntl(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_SETFD, FD_CLOEXEC) = 0
fcntl(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_DUPFD, 3) = 6
fcntl(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_GETFD) = 0
fcntl(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, F_SETFD, FD_CLOEXEC) = 0
getdents(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, /* 5 entries */, 32768) = 160
getdents(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, /* 0 entries */, 32768) = 0
close(3</srv/workspace/.cache/puppet/modules/role/templates/haproxy>) = 0
fchownat(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, "db-slaves.cfg.erb", 65534, 4294967295, AT_SYMLINK_NOFOLLOW) = 0
fchownat(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, "db.cfg.erb", 65534, 4294967295, AT_SYMLINK_NOFOLLOW) = 0
fchownat(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>, "db-master.cfg.erb", 65534, 4294967295, AT_SYMLINK_NOFOLLOW) = 0
close(6</srv/workspace/.cache/puppet/modules/role/templates/haproxy>) = 0
fchownat(4</srv/workspace/.cache/puppet/modules/role/templates>, "haproxy", 65534, 4294967295, AT_SYMLINK_NOFOLLOW) = 0

I would suspect that manipulating files causes a copy of the files.

Previously there is a noticeable delay:

Step 10/16 : COPY --from=builder /tmp/cache /srv/workspace/.cache

It probably refer to the same files, but changing the owner would trigger a file copy.

From docker info:

Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true

On my local machine the chown is fast. I got SSD but the storage driver is devicemapper

Event Timeline

hashar created this task.Oct 19 2017, 8:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 19 2017, 8:41 PM
hashar updated the task description. (Show Details)Oct 19 2017, 8:43 PM
hashar updated the task description. (Show Details)Oct 19 2017, 8:55 PM
hashar triaged this task as Low priority.Dec 7 2017, 9:51 PM

That image is now based on docker-pkg. Maybe that is faster now.

That image is now based on docker-pkg. Maybe that is faster now.

We should be able to make this much faster by avoiding the chown, is the ops-puppet one using the nobody user yet?

Seems like as root it:

  • creates /srv/workspace
  • git clone puppet.git at /srv/workspace/puppet
  • install tox and bundle dependencies which end up in the puppet dir
  • chown -R nobody the puppet dir

So I guess as root we should create /srv/workspace owned by nobody:nogroup, then switch to nobody and run all the other commands.

Change 397577 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: provision operations-puppet as 'nobody'

https://gerrit.wikimedia.org/r/397577

Change 397577 merged by jenkins-bot:
[integration/config@master] docker: provision operations-puppet as 'nobody'

https://gerrit.wikimedia.org/r/397577

Mentioned in SAL (#wikimedia-operations) [2017-12-12T14:39:36Z] <hashar> Rebuild operations-puppet-tests-docker image based on c76d8920901fd0be0f9ced3bc900cb72f2d1d4a2 | T178620 and /cache being owned by root

hashar closed this task as Resolved.Dec 12 2017, 2:40 PM

Tested on https://gerrit.wikimedia.org/r/#/c/397817/

That also fix /cache/ being owned by root:root. Whenever tox.ini got touched, pip was unable to write to the cache!