User Details
- User Since
- Oct 7 2014, 4:24 PM (450 w, 5 d)
- Availability
- Available
- IRC Nick
- marxarelli
- LDAP User
- Dduvall
- MediaWiki User
- DDuvall (WMF) [ Global Accounts ]
Apr 12 2023
Apr 7 2023
@dcausse thanks for filing this.
Confirmed that we can now push blubber's multi platform image to our registry. See https://gitlab.wikimedia.org/repos/releng/blubber/-/jobs/90258
Mar 29 2023
I was able to reproduce the problem locally and it has to do with how nginx (doesn't) apply the set $auth_request_path to the nested location block that matches exact manifest/blob digests. From the commit message of the associated patch:
Mar 28 2023
Thank you, @JMeybohm. That's very helpful.
Mar 27 2023
Mar 22 2023
Friendly ping. :)
I'm finally circling back to this, and I ran another test yesterday. The additional logging in jwt-authorizer confirms that this is an auth failure due to the token scope not matching the request URL. However, the reason for this mismatch is strange. (See below.)
Mar 20 2023
The last few rounds of load testing, whereby 100 concurrent image-build jobs are triggered at once, have all been successful in that:
Mar 9 2023
Just noting that the specific CPS mismatch here is the use of a CPS transformed method in a constructor. Constructors are never CPS transformed.
Mar 1 2023
Deployed.
I did some more investigation on this to better understand how exactly Istio is interfering with the traffic from the network namespaces on the buildkit0 bridge interface, and in doing so discovered the traffic.sidecar.istio.io/kubevirtInterfaces annotation that solves this problem in a less hacky way—or at least in an official Istio hacky way as the resulting iptables PREROUTING chain looks similar to what we added.
Feb 28 2023
Feb 24 2023
Feb 23 2023
Awesome!
Feb 21 2023
BuildKit has been redeployed. It now runs in privileged mode and has CNI configured for network isolation of build containers.
Feb 17 2023
For starters, perhaps we can create an account on docker.io for Release Engineering and generate a public puller token to use on WMCS runners. According to the GitLab CI runner docs, the docker executor should respect the ~/.docker/config.json for the runner's user. In this scenario, the creds would not be leaked to jobs.
Feb 16 2023
Feb 15 2023
Thanks! I will merge and deploy the backport. I might wait until tomorrow to re-roll train to group1, depending on my own time constraints.
@cscott I'm seeing a large spike in errors from Parsoid today. See https://logstash.wikimedia.org/goto/a7eb8ccc8cf7a123b9577e32e98f1b1c
Summary of our current solution:
Feb 14 2023
I should clarify. I did not refactor the security patch implementation, only the context so it would apply cleanly.
I've refactored the patch following application failure during scap stage-train.
Feb 10 2023
Patch file for integration/config ^. I assume it should be fine to submit this for review since I've already updated the job in Jenkins, but I thought I'd check with you first, @sbassett.
I've updated the job from changes I made locally and it seems to work as expected.
Given this job is run inside a container as user nobody (which is the cause of the error since the repo is mounted and has different ownership), I think it's probably safe to add git config --global --add safe.directory to the script, but the checks (the entire script really) seem quite difficult to reason about and the status of the git command should be handled separately from those of sed/grep.
Feb 8 2023
Jan 19 2023
This is done. Evaluation has long been done and we have a working image build system that can also publish (and soon will perform test deployments).
Jan 18 2023
Jan 6 2023
(Orthogonal but worth a discussion in our next meeting.)
Dec 16 2022
Dec 15 2022
I'm thinking the best course for us at this point might be to rely on binaries verified by checksum for now and not the upstream deb package.
We're now seeing errors during our image build. It seems the component and package may no longer be in our repo.
One more thought: Since the DigitalOcean runners will be instance wide starting in the new year and will be configured to run untagged jobs, I'm not sure their tags will matter as much for the most general use cases.
Dec 8 2022
Which way are we going with this? FWIW I think fronting with nginx would allow offloading of both auth (via jwt-authorizer) and GET/HEAD request caching. Honestly, if we could get the existing nginx ingress working for internal requests, it could handle all of that.
It's possible for a blubber.yaml to be in a subdirectory of .pipeline as well. Could you run the command again with a less restrictive pattern, maybe just grep -iq blubber.yaml?
Dec 2 2022
Dec 1 2022
Done. See T322691#8433642
Nov 30 2022
Nov 29 2022
Nov 17 2022
The buildkitd deployment running in cloud-runner now has access to the DO Spaces credentials. However, there's a pretty large snafu here that I ran into while testing with a blubber MR.
Nov 15 2022
Nov 10 2022
\o/ thank you!!!
Would it be ok to add GetInRelease: no to the reprepro updates file? According to the manpage:
Nov 9 2022
Nov 8 2022
I enabled debug logging for buildkitd on the gitlab-runner hosts and re-ran the failed job. It appears that auth failure may be the root cause—the client is definitely misbehaving but the there should not be auth failure here. It's unclear why.
@JMeybohm can you provide the nginx access log entries from that time period as well? I'm trying to rule out auth failure as a factor and docker-registry log entries do not include the subrequests between nginx and jwt-authorizer.
Thanks for debugging this further, @JMeybohm and @hashar. In your recent re-run of the job it seems the manifest list is accepted by the registry, and it's the subsequent manifest push that fails, so you're right this doesn't seem to be related to a lack of manifest list support for perhaps some errant behavior on buildkit's side that manifests under this multi-platform push condition. I'll re-word and triage accordingly.
Nov 7 2022
Thanks for the review/merge, @MoritzMuehlenhoff and @Dzahn! I don't see the packages yet but I'm assuming the actual import is a manual step?
Thanks for filing this!
Nov 5 2022
Nov 4 2022
This may be blocked by T322453: Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push which is preventing the publishing of multi-platform images. The dockerfile-copy image needs to be multi-platform to achieve parity with the upstream version.
Nov 3 2022
Nov 1 2022
See https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/15 for multi-platform support in Blubber, currently under review.