Page MenuHomePhabricator

[builds-builder] golang based images get infinite nested loops for procfile entries
Open, Stalled, MediumPublicBUG REPORT

Description

Seen while working on T363028: Replace custom deployment with build service and job service via https://gitlab.wikimedia.org/toolforge-repos/bridgebot

Procfile
bot: bridgebot
test-bot: bridgebot -conf etc/testing.toml
$ test-bot --help
bridgebot -conf etc/testing.toml: line 1: bridgebot -conf etc/testing.toml: No such file or directory
$ bridgebot --help
ERROR: failed to launch: direct exec: argument list too long
Procfile
bot: /layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/bridgebot.toml
testbot: /layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/testing.toml
$ testbot --help
/layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/testing.toml: line 1: /layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/testing.toml: No such file or directory
$ time bridgebot --help
ERROR: failed to launch: direct exec: argument list too long

real    0m16.379s
user    0m6.879s
sys     0m9.129s

Calling the golang built binary using it's full path without the assistance of Procfile or launcher works:

$ webservice buildservice shell --mount none -m 2G -c 1
$ /layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/testing.toml
0000]  INFO router:       (/layers/heroku_go/go_deps/cache/gitlab.wikimedia.org/toolforge-repos/bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/gateway/router.go:66: github.com/42wim/matterbridge/gateway.(*Router).Start) Parsing gateway testing-irc-telegram
[0000]  INFO router:       (/layers/heroku_go/go_deps/cache/gitlab.wikimedia.org/toolforge-repos/bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/gateway/router.go:75: github.com/42wim/matterbridge/gateway.(*Router).Start) Starting bridge: irc.testing
...

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
fix_nested_procfile: when appending use the right separatorrepos/cloud/toolforge/builds-builder!46dcarofix_path_appendmain
builds-builder: bump to 0.0.104-20240506093336-61a74e27repos/cloud/toolforge/toolforge-deploy!277ghostbump_builds-buildermain
procfile: avoid recursive calls for golang bulidpackrepos/cloud/toolforge/builds-builder!45dcarofix_golangmain
Fix issues found during live testing of initial implementationtoolforge-repos/bridgebot!1bd808work/bd808/T363028main
Customize query in GitLab

Event Timeline

We are a few versions behind on https://github.com/heroku/buildpacks-go, but I don't see anything in the commits or CHANGELOG that looks directly Procfile related. The latest tagged release also may not be compatible with pack (https://github.com/heroku/buildpacks-go/commit/111bb19806bb838c457ef1778a30487dc50f1cb0).

Adding tiny shell wrappers for the Procfile to call seems to work around the issue.

The procfile endpoints don't allow any new arguments if they have any already, this is an upstream issue on the procfile buildpack (T356016: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”)).
There's a note and a workaround in https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Procfile (feel free to reword if you find it not clear there).

For the bridgebot entry itself, the issue is that the golang buildpack generates on the fly the bridgebot procfile entry already:

dcaro@urcuchillay$ podman run --entrypoint bash --rm -ti tools-harbor.wmcloud.org/tool-bridgebot/tool-bridgebot:latest

heroku@9bda9cb34302:/workspace$ launcher bash

heroku@9bda9cb34302:/workspace$ which bridgebot
/cnb/process/bridgebot

heroku@9bda9cb34302:/workspace$ cat /layers/config/metadata.toml                    
...
[[processes]]
  type = "bridgebot"
  command = "bridgebot"
  args = []
  direct = true
  buildpack-id = "heroku/go"
...

And that gets it confused, as we are adding the /cnb/process path in the $PATH to support and ends up calling itself.

Probably it should use the full path, instead of just the procname (https://github.com/heroku/buildpacks-go/blob/main/buildpacks/go/src/proc.rs#L41), might send a patch/open a bug (though probably not today, /me on sick leave).

This should fix it on our side until we get the upstream side done:
https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/45

It changes the behavior a bit for golang buildservice images though, as now any system binary will shadow any procfile entry (while usually is the other way around).

dcaro renamed this task from Golang and Procfile buildpacks not working together as expected to [builds-builder] golang based images get infinite nested loops for procfile entries.May 6 2024, 8:37 AM
dcaro changed the task status from Open to In Progress.
dcaro triaged this task as Medium priority.
dcaro edited projects, added Toolforge (Toolforge iteration 09); removed Toolforge.
dcaro moved this task from Next Up to In Review on the Toolforge (Toolforge iteration 09) board.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/277

builds-builder: bump to 0.0.104-20240506093336-61a74e27

@bd808 This should have been fixed, can you try now?

@bd808 This should have been fixed, can you try now?

$ echo $PATH
/layers/heroku_go/go_target/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/cnb/process:/cnb/lifecycle
$ which bridgebot
/layers/heroku_go/go_target/bin/bridgebot
$ bridgebot --help
Usage of bridgebot:
  -conf string
        config file (default "etc/bridgebot.toml")
$ /cnb/process/bridgebot --help
Usage of bridgebot:
  -conf string
        config file (default "etc/bridgebot.toml")
$ /cnb/lifecycle/launcher bridgebot --help
Usage of bridgebot:
  -conf string
        config file (default "etc/bridgebot.toml")

Seems better, yes. Thanks for the quick hackfix.

Flagging as upstream to check in on it eventually

dcaro changed the task status from In Progress to Stalled.May 13 2024, 8:32 AM
dcaro claimed this task.