Page MenuHomePhabricator

Rust image build on toolforge fails
Closed, ResolvedPublicBUG REPORT

Description

Tool: listeria

Command: toolforge build start https://github.com/magnusmanske/listeria_rs/

Output (first line is "compilation successful"):

[step-build] 2024-02-27T07:52:14.414157326Z     Finished release [optimized] target(s) in 1m 34s
[step-build] 2024-02-27T07:52:14.635795764Z + profile_dir=/layers/emk_rust/profile
[step-build] 2024-02-27T07:52:14.636035096Z + mkdir -p /layers/emk_rust/profile/profile.d
[step-build] 2024-02-27T07:52:14.638733486Z + cat
[step-build] 2024-02-27T07:52:14.641127896Z + [[ -d .profile.d ]]
[step-build] 2024-02-27T07:52:14.641229099Z + [[ -f /tmp/tmp.1uQhk80UAY/target/export ]]
[step-build] 2024-02-27T07:52:14.641311482Z + echo 'build = true'
[step-build] 2024-02-27T07:52:14.641404234Z + mkdir -p /layers/emk_rust/profile/env.build/
[step-build] 2024-02-27T07:52:14.644790017Z + /tmp_builder/buildpacks/emk_rust/0.1/bin/exports /tmp/tmp.1uQhk80UAY/target/export /platform /layers/emk_rust/profile/env.build/
[step-fix-nested-procfile-launcher] 2024-02-27T07:52:15.031040239Z Skipping, procfile buildpack not detected...
[step-fix-permissions] 2024-02-27T07:52:15.292999113Z > Setting permissions on '/layers'...
[step-fix-permissions] 2024-02-27T07:52:22.695861328Z > Setting permissions on '/workspace'...
[step-export] 2024-02-27T07:50:11.490129515Z 2024/02/27 07:50:11 warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied
[step-export] 2024-02-27T07:52:23.706635209Z Reusing layers from image 'tools-harbor.wmcloud.org/tool-listeria/tool-listeria@sha256:d701b197434a3789237bb17c39cf02eb7271d3d9905d9e41e99625446a3fcd6d'
[step-export] 2024-02-27T07:52:24.380438746Z Reusing layer 'emk/rust:profile'
[step-export] 2024-02-27T07:52:24.384598633Z Reusing layer 'buildpacksio/lifecycle:launch.sbom'
[step-export] 2024-02-27T07:52:25.477004512Z Adding 1/1 app layer(s)
[step-export] 2024-02-27T07:52:25.497679133Z Reusing layer 'buildpacksio/lifecycle:launcher'
[step-export] 2024-02-27T07:52:25.500394181Z Reusing layer 'buildpacksio/lifecycle:config'
[step-export] 2024-02-27T07:52:25.502831308Z Adding label 'io.buildpacks.lifecycle.metadata'
[step-export] 2024-02-27T07:52:25.505190745Z Adding label 'io.buildpacks.build.metadata'
[step-export] 2024-02-27T07:52:25.505201807Z Adding label 'io.buildpacks.project.metadata'
[step-export] 2024-02-27T07:52:25.506593859Z no default process type
[step-export] 2024-02-27T07:52:25.506802804Z Saving tools-harbor.wmcloud.org/tool-listeria/tool-listeria:latest...
[step-export] 2024-02-27T07:52:30.169995555Z *** Images (sha256:6f9976c94782b3c22e036e9da7c231e8481567af26ad0de119592342767cf785):
[step-export] 2024-02-27T07:52:30.170146956Z       tools-harbor.wmcloud.org/tool-listeria/tool-listeria:latest - PATCH https://tools-harbor.wmcloud.org/v2/tool-listeria/tool-listeria/blobs/uploads/b2a3b745-0384-4ae8-a013-bd33ccb796b9?_state=REDACTED: unexpected status code 500 Internal Server Error: <html>
[step-export] 2024-02-27T07:52:30.170213054Z <head><title>500 Internal Server Error</title></head>
[step-export] 2024-02-27T07:52:30.170226780Z <body>
[step-export] 2024-02-27T07:52:30.170238107Z <center><h1>500 Internal Server Error</h1></center>
[step-export] 2024-02-27T07:52:30.170249473Z <hr><center>nginx/1.18.0</center>
[step-export] 2024-02-27T07:52:30.170258708Z </body>
[step-export] 2024-02-27T07:52:30.170268083Z </html>
[step-export] 2024-02-27T07:52:30.170277038Z
[step-export] 2024-02-27T07:52:30.190690947Z ERROR: failed to export: failed to write image to the following tags: [tools-harbor.wmcloud.org/tool-listeria/tool-listeria:latest: PATCH https://tools-harbor.wmcloud.org/v2/tool-listeria/tool-listeria/blobs/uploads/b2a3b745-0384-4ae8-a013-bd33ccb796b9?_state=REDACTED: unexpected status code 500 Internal Server Error: <html>
[step-export] 2024-02-27T07:52:30.190707625Z <head><title>500 Internal Server Error</title></head>
[step-export] 2024-02-27T07:52:30.190711440Z <body>
[step-export] 2024-02-27T07:52:30.190714435Z <center><h1>500 Internal Server Error</h1></center>
[step-export] 2024-02-27T07:52:30.190717779Z <hr><center>nginx/1.18.0</center>
[step-export] 2024-02-27T07:52:30.190720689Z </body>
[step-export] 2024-02-27T07:52:30.190723625Z </html>
[step-export] 2024-02-27T07:52:30.190726482Z ]
[step-results] 2024-02-27T07:52:30.890009446Z 2024/02/27 07:52:30 Skipping step because a previous step failed

So now I am not even sure if the "latest" image is the one I just build, or the previous?

Event Timeline

The compilation is for the build step (it compiled the rust code), it failed during export so it never sent the built image to the registry.
The current 'latest' is the previous successful build.

I agree that it's not easy to link an image to a build, feel free to open a task for that specifically.

On the failure side, is it something you can reproduce or it happened just once? (seems also unrelated to rust/etc, as the export step should be independent from what you have in the image, it just does the equivalent of 'docker push').

dcaro triaged this task as High priority.Feb 27 2024, 9:28 AM
dcaro edited projects, added Toolforge; removed Toolforge Build Service.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
dcaro edited projects, added Toolforge (Toolforge iteration 06); removed Toolforge.
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 06) board.

This is T354116: Harbor uploads sometimes fail due to tmpfs space on project-proxy most likely?

taavi@proxy-03:~$ sudo grep tools-harbor /var/log/nginx/error.log
2024/02/27 07:52:25 [crit] 2458144#2458144: *44076682 pwrite() "/var/lib/nginx/body/0000327926" failed (28: No space left on device), client: 172.16.2.89, server: , request: "PATCH /v2/tool-listeria/tool-listeria/blobs/uploads/985f8be3-608a-4a06-a4b5-7cb7ae4fc416?_state=REDACTED HTTP/2.0", host: "tools-harbor.wmcloud.org"

I think I need help reproducing this. So far multiple runs on tools and toolsbeta failed to trigger this error. Is it likely that this is a combination of more than 1 factor, maybe a result of multiple users trying to upload at the same time?

@Magnus can you consistently reproduce this issue? if so what are the steps to do that?

Magnus claimed this task.

Ran it again now, works fine. Closing this issue, will re-open if it keeps happening.

Now happening repeatedly for the mix-n-match tool:

toolforge build start https://github.com/magnusmanske/mixnmatch_rs/

Just checked the last build passed:

[step-export] 2024-03-08T15:32:52.375240484Z *** Images (sha256:2068dde1e3e15eef37d5ac0d4b68b68cd6cc42a3a17d80e6c2bfaa60fc380302):
[step-export] 2024-03-08T15:32:52.375290287Z       tools-harbor.wmcloud.org/tool-mix-n-match/tool-mix-n-match:latest
[step-export] 2024-03-08T15:33:17.899685188Z Adding cache layer 'emk/rust:shim'
[step-results] 2024-03-08T15:33:30.549466231Z Built image tools-harbor.wmcloud.org/tool-mix-n-match/tool-mix-n-match:latest@sha256:2068dde1e3e15eef37d5ac0d4b68b68cd6cc42a3a17d80e6c2bfaa60fc380302

I do see many logs on nginx about no space left:

root@proxy-03:~# grep mix-n-match /var/log/nginx/error.log | grep 'No space left' | wc
     13     312    7363

Yes, the third attempt passed. However, 500 errors should be cause for concern, even if it eventually runs through.

Question: Does "no space left" refer to the mix-n-match tool disk space, or the build environment disk space?

Yes, the third attempt passed. However, 500 errors should be cause for concern, even if it eventually runs through.

Question: Does "no space left" refer to the mix-n-match tool disk space, or the build environment disk space?

None :), it's on the proxy in front of the image repository (temporary space for the proxy to handle requests).
So for some reason the proxy is trying to store more stuff in it's temp directory than it can handle, but it's not storing the whole request either (we can upload >1G at a time), so we have to figure out how to avoid it and/or increase the temp directory (the issue increasing the size of the temp directory is that it's tmpfs so it's all in RAM).

Bumping project tag so task shows up on an active workboard

@Magnus we have done a few changes in the proxy config to alleviate this issue, are you still seeing the errors?

This appears to be reolved, and replaced by the too many open files bug (another Phab ticket is open).