Page MenuHomePhabricator

Investigate using a cache store/restore system for package managers
Closed, ResolvedPublic

Description

Discussing with OpenStack people, they have some jobs downloading Linux distributions and are looking for a cache/mirroring solution. Their RFC (== spec) is at https://review.openstack.org/#/c/194477/ .

FWICT, that's a proposal around caching nodepool images, not packages or other dependencies that various jobs require. In my mind, they are separate problems with only marginal overlap: CI base images will be almost completely homogenous in our case while dependent system/gem/pip/composer/npm packages vary widely from job to job.

Travis implements a user-/job-specific system that restores and caches specific directories before and after each job executes, storing the data in S3. We could implement something similar but it would require a reliable central store, and the whole setup seems a little 'brute force' to me.

Another possibility that @hashar and I discussed was to provide separate read-only caches for the specific packaging systems—read-only to protect against the corruption that might occur during concurrent updates. Each cache would augment the package manager's read-write destination within the workspace and be periodically updated to include new packages. The update process could be scheduled or triggered at the end of each job as long as we can reliably audit which packages were installed locally during execution.

This was discussed in https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-06-13.59.html see point 5.

Travis implements a user-/job-specific system that restores and caches specific directories before and after each job executes, storing the data in S3. We could implement something similar but it would require a reliable central store, and the whole setup seems a little 'brute force' to me.

With tar and s3cmd this would probably be a shell one liner. If we can't get a swift or ceph object store for labs from ops, we could use rsync to an integration instance.

If we go this way to ensure isolation is maintained we need to make sure that only nodepool instances have permission to update caches that were run for gate-and-submit or post-merge, but not for test/check.

Event Timeline

hashar raised the priority of this task from to Medium.
hashar updated the task description. (Show Details)

Namespacing the caches per repo/branch would causes us to have a lot of different caches which would consume a good chunk of disk space on the central repository. Not sure how much of a problem it can be.

As long as we trust our +2ers, using one cache per type (i.e. one for composer, different one for gem) should be fine. Another solution is to use content addressing to deduplicate.

Change 253322 had a related patch set uploaded (by Hashar):
contint: rsync server to hold jobs caches

https://gerrit.wikimedia.org/r/253322

See the proof of concept https://gerrit.wikimedia.org/r/264327 based on rsync and a central rsync cache.

Lets follow up on parent task T112560 , not much left to investigate.

hashar claimed this task.

This has been implemented and tracked in the parent task.

Did a first pass using a cache store/restore system based on rsync. Investigated as part of T116017

Change 253322 merged by Filippo Giunchedi:
contint: rsync server to hold jobs caches

https://gerrit.wikimedia.org/r/253322

Had the left over puppet patch merged via Puppet SWAT.