Page MenuHomePhabricator

[builds-api] Improve error message when logs time out
Open, LowPublic

Description

Sometimes, builds-api gives up waiting for the logs when it's waiting for the pipelinerun to start:

{"time":"2024-01-10T12:35:59.919354032Z","id":"","remote_ip":"127.0.0.1","host":"127.0.0.1:8000","method":"GET","uri":"/v1/build/tf-test-buildpacks-pipelinerun-7nvk5/logs?follow=True","user_agent":"toolforge_builds_cli toolforge_weld/1.4.0 python-requests/2.28.1","status":500,"error":"","latency":60390174565,"latency_human":"1m0.390174565s","bytes_in":0,"bytes_out":122}
{"level":"error","msg":"Error getting the logs for tf-test-buildpacks-pipelinerun-7nvk5: timed out waiting for pipelinerun to start","time":"2024-01-10T12:35:59Z"}

On the user side, it looks like this:

toolsbeta.tf-test@lima-bookworm:~$ toolforge build start https://gitlab.wikimedia.org/toolforge-repos/wm-lol
BuildClientError: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.21.0</center>
</body>
</html>

Please report this issue to the Toolforge admins if it persists: https://w.wiki/6Zuu

The pipeline itself is successful:

toolsbeta.tf-test@lima-bookworm:~$ toolforge build show
Build ID: tf-test-buildpacks-pipelinerun-7nvk5
Start Time: 2024-01-10T12:37:50Z
End Time: 2024-01-10T12:41:25Z
Status: ok
Message: Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0
Parameters:
    Source URL: https://gitlab.wikimedia.org/toolforge-repos/wm-lol
    Ref: N/A
    Envvars: N/A
Destination Image: 172.19.0.1/tool-tf-test/tool-tf-test:latest
toolsbeta.tf-test@lima-bookworm:~$ toolforge build show tf-test-buildpacks-pipelinerun-7nvk5
Build ID: tf-test-buildpacks-pipelinerun-7nvk5
Start Time: 2024-01-10T12:37:50Z
End Time: 2024-01-10T12:41:25Z
Status: ok
Message: Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0
Parameters:
    Source URL: https://gitlab.wikimedia.org/toolforge-repos/wm-lol
    Ref: N/A
    Envvars: N/A
Destination Image: 172.19.0.1/tool-tf-test/tool-tf-test:latest

Event Timeline

@Slst2020 we have patches here https://phabricator.wikimedia.org/T354189 that increases the timeout to 10 minutes (it was 1 minute) before. This doesn't directly solve the problem you mentioned in the issue but what it does is make it so much less likely to occur in the first place.

from @taavi: maybe the best solution here is to patch the api gateway to return a nicely formatted json so the client can parse it in a nicer way

dcaro triaged this task as Low priority.Mar 5 2024, 1:12 PM
dcaro edited projects, added Toolforge; removed Toolforge Build Service.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.