In T112560#1643322, @dduvall wrote:In T112560#1643228, @hashar wrote:Discussing with OpenStack people, they have some jobs downloading Linux distributions and are looking for a cache/mirroring solution. Their RFC (== spec) is at https://review.openstack.org/#/c/194477/ .
FWICT, that's a proposal around caching nodepool images, not packages or other dependencies that various jobs require. In my mind, they are separate problems with only marginal overlap: CI base images will be almost completely homogenous in our case while dependent system/gem/pip/composer/npm packages vary widely from job to job.
Travis implements a user-/job-specific system that restores and caches specific directories before and after each job executes, storing the data in S3. We could implement something similar but it would require a reliable central store, and the whole setup seems a little 'brute force' to me.
Another possibility that @hashar and I discussed was to provide separate read-only caches for the specific packaging systems—read-only to protect against the corruption that might occur during concurrent updates. Each cache would augment the package manager's read-write destination within the workspace and be periodically updated to include new packages. The update process could be scheduled or triggered at the end of each job as long as we can reliably audit which packages were installed locally during execution.
In T112560#1705835, @JanZerebecki wrote:This was discussed in https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-06-13.59.html see point 5.
In T112560#1643322, @dduvall wrote:Travis implements a user-/job-specific system that restores and caches specific directories before and after each job executes, storing the data in S3. We could implement something similar but it would require a reliable central store, and the whole setup seems a little 'brute force' to me.
With tar and s3cmd this would probably be a shell one liner. If we can't get a swift or ceph object store for labs from ops, we could use rsync to an integration instance.
If we go this way to ensure isolation is maintained we need to make sure that only nodepool instances have permission to update caches that were run for gate-and-submit or post-merge, but not for test/check.