Page MenuHomePhabricator

Create a kubernetes container with mono and dotnet
Closed, ResolvedPublic

Description

My MilHist bots use C# via mono. Create a kubernetes container with mono and msbuild. Preferably the latest stable version of mono. (6.12.0)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

When trying to run mono executable inside this image, I'm getting exception:

[ERROR] FATAL UNHANDLED EXCEPTION: System.AggregateException: One or more errors occurred. (The SSL connection could not be established, see inner exception.) ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception. ---> System.Security.Authentication.AuthenticationException: Authentication failed, see inner exception. ---> Mono.Btls.MonoBtlsException: Ssl error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

at ./external/boringssl/ssl/handshake_client.c:1132

Fast googling reveals that this might be about about OS/mono certificate store. Both require sudo, so I guess I cannot fix it on my side.
Obviously without the ability to establish https connections, I cannot use the mediawiki API, so none of my bots can run as of now.

No good. Does not have msbuild.

Having the same problem as Ghuron

10:37 14 October 2022 Error: TrustFailure (Authentication failed, see inner exception.)

No good. Does not have msbuild.

Does the grid engine environment have msbuild? I'm not able to find it using which msbuild. I'm not seeing any evidence in Puppet that we install any mono packages other than mono-complete and mono-fastcgi-server or that either of those packages have historically included the msbuild tool.

It seems what is being asked for is a dotnet6 sdk, which as far as I can tell, isn't packaged in Debian. I do see it packaged in Fedora, and it seems like it has msbuild you are looking for? Have a look: https://packages.fedoraproject.org/pkgs/dotnet6.0/dotnet-sdk-6.0/fedora-rawhide.html#files.

If so, creating an image would be easiest by having a debian package to install. It seems Microsoft provides a debian repo with packages, but it's unclear what licensing they are under.: https://packages.microsoft.com/debian/11/prod/.

I am curious what is different about this image versus the grid environment? The existing mono package on the grid is running tools successfully now yes? Do we know why an image with similar mono support doesn't work?

We use Rocky 9 here. All software here is patched an kept up to date due to the threat from foreign-government-sponsored hackers and ransomware attacks.

I strongly urge that the latest version of mono be installed. A package is available. You can find the instructions here:

https://www.mono-project.com/download/stable/#download-lin-debian

I have tested this install on Rocky 9.

We used to have msbuild on the grid, but lost it when we were forced onto a server with an older version of mono installed. Since then I haven't been able to rebuild my C# bots.

They do still run though, but I get occasional errors.

msbuild is available in the current version of mono.

Installing the latest stable version not only means that I can rebuild them on the k8s images, I can also build and test them here where I have a debugging environment and be certain that they will compile and run in the image.

I want to emphasize that right now https-connections cannot be established which makes tf-mono68 practically useless in toolforge environment. Steps to reproduce:

  1. Create a test.cs text file with the following content:
class Test {
  public static void Main(){
    new System.Net.WebClient().DownloadString("https://en.wikipedia.org/");
  }
}
  1. Execute mcs test.cs to compile it to test.exe
  2. Run toolforge-jobs run test --command "mono test.exe" --image tf-mono68 and check test.err

The documentation says: "The package ca-certificates-mono should be installed to get SSL certificates for HTTPS connections. Install this package if you run into trouble making HTTPS connections."

The documentation says: "The package ca-certificates-mono should be installed to get SSL certificates for HTTPS connections. Install this package if you run into trouble making HTTPS connections."

I tried adding this in a local build of the container and then using the information from T311466#8323940 to do a test. I am still seeing a "Mono.Btls.MonoBtlsException: Ssl error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED" fatal error. This tickled something in my brain though which led me to rediscover https://github.com/mono/mono/issues/21233 and specifically the workaround from https://github.com/mono/mono/issues/21233#issuecomment-932211479. When I apply that certificate exclusion manually in the container (which is running as root in my local tests) the mono test.exe command runs to completion.

This tickled something in my brain though which led me to rediscover https://github.com/mono/mono/issues/21233 and specifically the workaround from https://github.com/mono/mono/issues/21233#issuecomment-932211479. When I apply that certificate exclusion manually in the container (which is running as root in my local tests) the mono test.exe command runs to completion.

T292289: Toolforge mono version on stretch grid doesn't trust latest LE certs

Change 844512 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/docker-images/toollabs-images@master] mono68: Remove expired DST Root CA X3 cert

https://gerrit.wikimedia.org/r/844512

Well done.

I am still unsure as to how the k8s concept is supposed to work. Previously my workflow was:
(1) Check out the sources from git under my account
(2) Build the exes and dlls on my account using msbuild and the csproj project file
(3) Copy them to the milhistbot account
(4) Run then using cron

Currently step 2 does not work due to the loss of msbuild, so I cannot rebuild the project.

Now none of these steps are available with k8s.

I can try:
(1) Build with mono 6.12 using msbuild and the csproj project file on my Red Hat server
(2) Copy the exes and dlls to my account
(3) Copy them again to the milhistbot account
(4) Run them using tool-forge jobs with on schedule

This is less than ideal; it involves building with a newer version of mono than we are running, and on a different platform, risking compatibility issues. And involves sftp copying binary files around. For some reason I have to become milhistbot before being able to use the container.

Although the idea of building things INSIDE execution container seems awkward to me, upgrading mono to the latest stable (6.12.0.182) sounds like a right thing to do

Change 844512 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] mono68: Remove expired DST Root CA X3 cert

https://gerrit.wikimedia.org/r/844512

Mentioned in SAL (#wikimedia-cloud) [2022-10-20T10:54:34Z] <taavi> rebuild mono68-sssd image with the expired DST Root CA X3 removed T311466

Confirm that code executed in tf-mono68 can access mediawiki API
Will continue testing

Confirmed that my executables run on the docker container.

I did get one transient error:

SecureChannelFailure (Unable to read data from the transport connection: Connection reset by peer.)

Per Ghuron, an alternative workflow would be:
(1) Check out the sources from git under my account
(2) Copy them to the bot account
(3) Build them on the container with a rebuild toolforge-job
(4) Run them using toolforge-jobs with on schedule

This would require msbuild to be available, which in turn would require mono 6.12

Per Ghuron, an alternative workflow would be:

Theoretically one can build a bot executable using C# compiler (mcs is mono 6.08), but I believe we need the latest 6.12 for many other reasons.
I was having problems with 6.08 connecting sites, that require TLS 1.3 (which is rare at the moment, but would not be so rare in the near future)

This comment was removed by Kotz.

An alternative would be to install the latest version of dotnet

Best practice for containers is for each app to have its own. It is regarded as an error to use them like virtual machines. See https://cloud.google.com/architecture/best-practices-for-building-containers

@Hawkeye7 we are testing build service beta which uses buildpacks on Toolforge.
Do you think an upstream mono buildpack will work for you?

Take a look at our quickstart guide here.
We will be glad to receive feedback after you give it a try.

We are tracking this project on this board using the tag: "toolforge_build_service_beta_release"

Seeing as heroku doesn't seem to have a dotnet buildpack, I'm guessing it would be https://github.com/paketo-buildpacks/dotnet-core.

Following https://paketo.io/docs/howto/dotnet-core/, I tried with https://gitlab.wikimedia.org/toolforge-repos/milhistbot. After adding a property to specify target framework,

<PropertyGroup>
  <TargetFramework>net6.0</TargetFramework>
</PropertyGroup>

It seemed to start the process and build. I'm afraid my knowledge of how to build .net projects is limited, so I get an error running dotnet publish. (The repo has perl and several mono projects it seems?) Validation that this buildpack will or won't work would be useful. The buildpack supports .net 6 and .net 7, and utilizes msbuild.

@komla Sorry, I was away and did not see your comment until now. A build pack will not work for me they way you are trying. I have multiple C# projects in the one git repository. It was never intended to be built in this manner. It will simply give you an error telling you that it does not know what project to build, and there is no workaround for this.

There is NO WAY that I can do what @nskaggs is suggesting! I could easily get it to work here, but I do not have access to get it to build and run on your server.

Dotnet is shipped with Rocky 9, but is not available on the grid either. I can build here but get errors like:
./Liftwing: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./Liftwing)
./Liftwing: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./Liftwing)
./Liftwing: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./Liftwing)

This is very frustrating. My bots are unmaintainable because I have no build environment!

I can build a container here easily - I have a lot of experience with docker - but have no way to deploy it on your servers.

Using a dotnet buildpack is fairly simple. eg.:
$ pack build liftwing --buildpack paketo-buildpacks/dotnet-core --builder paketobuildpacks/builder:base --env BP_DOTNET_PROJECT_PATH=./Liftwing
$ docker run liftwing "Hanford Engineer Works" 2>/dev/nul
Setting ASPNETCORE_URLS=http://0.0.0.0:8080
FA

However, I have no idea how this could be achieved with toolforge. Pack build relies on me being in the correct directory of the checkout. (In this case mono/Liftwing/)

I have multiple C# projects in the one git repository. It was never intended to be built in this manner.

These are reasonable statements of fact.

It will simply give you an error telling you that it does not know what project to build, and there is no workaround for this.

There is a workaround, and that is splitting your codebase into distinct gitlab repos which are designed to work in a buildpack system.

Using a dotnet buildpack is fairly simple. eg.:
$ pack build liftwing --buildpack paketo-buildpacks/dotnet-core --builder paketobuildpacks/builder:base --env BP_DOTNET_PROJECT_PATH=./Liftwing
$ docker run liftwing "Hanford Engineer Works" 2>/dev/nul
Setting ASPNETCORE_URLS=http://0.0.0.0:8080
FA

However, I have no idea how this could be achieved with toolforge. Pack build relies on me being in the correct directory of the checkout. (In this case mono/Liftwing/)

This reads to me like splitting your "mono/Liftwing/" directory into a dedicated git repo would make the Toolforge supported system work for this part of your project. You can use https://toolsadmin.wikimedia.org/tools/id/milhistbot/repos/create to create as many distinct git repos for your milhistbot tool as are needed.

This reads to me like splitting your "mono/Liftwing/" directory into a dedicated git repo would make the Toolforge supported system work for this part of your project. You can use https://toolsadmin.wikimedia.org/tools/id/milhistbot/repos/create to create as many distinct git repos for your milhistbot tool as are needed.

My naive assumption that we have a mono pack in the stack has been corrected by @Slst2020. Sorry for the confusion.

As you say, the biggest obstacle is that there is no dotnet buildpack available.

I don't know if any work has been done on this. Or indeed whether you are proceeding with the buildpack approach.

Liftwing is just a test program I created to test out the use of Liftwing instead of Ores. All it does is predict the rating of the article. But it works the same way as all my other C# tools. There is a shared library that handles dealing with Mediawiki (Wikimedia.dll), and a main program that uses it to perform useful functions related to the administration of featured articles on the English language Wikipedia.

I could, as you suggest, place each one in its own git repository with its own copy of Wikimedia.dll. I would have to upload a copy into each repository that requires it. That would work, so long as it was compatible with your environment (preferably identical). Otherwise I will get errors like:
bin/Debug/net7.0/Liftwing: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_2.33' not found (required by bin/Debug/net7.0/Liftwing)

As you say, the biggest obstacle is that there is no dotnet buildpack available.

I don't know if any work has been done on this.

Not really, no. There is a community contributed https://elements.heroku.com/buildpacks/jincod/dotnetcore-buildpack that should work with Heroku stack we are currently using, but nobody has actually tested that as far as I am aware. We currently do not have a mechanism for a tool to add custom buildpacks to the default stack or to replace the stack entirely, so testing and deployment would need to happen from the WMCS side of things before you could do much more.

Or indeed whether you are proceeding with the buildpack approach.

We discussed the option of making a new mono base container using Bookworm and the https://www.mono-project.com/download/stable/#download-lin-debian packages on IRC today. We are not sure that is a generally helpful approach in the longer term as we really would like most projects to be able to convert to custom images via buildpack sooner rather than later. We plan to have a larger discussion tomorrow in the WMCS team meeting. Hopefully we can report back on which options we would like to try next to unblock you soon after that meeting.

When I try to use the admin console (https://toolsadmin.wikimedia.org/tools/id/milhistbot/repos/create) to create a new repository it says:
No GitLab accounts found for tool maintainers.

When I try to use the admin console (https://toolsadmin.wikimedia.org/tools/id/milhistbot/repos/create) to create a new repository it says:
No GitLab accounts found for tool maintainers.

This is a bootstrapping problem that I haven't quite fixed -- T323767: Add potential next step for Toolforge error "No GitLab accounts found for tool maintainers.". The workaround is for you to go to https://gitlab.wikimedia.org/ and use the "Sign in" button in the upper right corner to create your initial account. You will also need to visit GitLab (Account Approval) after your initial GitLab login to get your Developer account approved to actually do useful work in GitLab.

I have deployed both, the dotnet buildpack and the support for build environment variables. So when you are able to get repositories up and running, feel free to test it and let me know if you find issues or if I can help with anything else :)

I don't quite understand this.

I know how to build locally:
pack build liftwing --buildpack paketo-buildpacks/dotnet-core --builder paketobuildpacks/builder:base --env BP_DOTNET_PROJECT_PATH=./Liftwing

But how will a toolforge build start know to user the dotnet buildpack?

But how will a toolforge build start know to user the dotnet buildpack?

The buildpack that @dcaro ended up adding is https://gitlab.wikimedia.org/repos/cloud/toolforge/dotnetcore-buildpack. The bin/detect script there is what gets run to decide if the buildpack should be applied. It currently looks for files named Startup.cs, Program.cs, or Program.fs in the tool's repository. If any of these is found then the bin/compile script will be run to add things to the container.

@bd808 Thanks for that. I will ensure that there is a Program.cs element in the top level of the repository.

The documentation says to become milihistbot and run the build from there

$ become mytool
$ toolforge build start https://gitlab.wikimedia.org/toolforge-repos/<your-repo>
$ toolforge build show # wait until build passed

In my case, the user will be tools.milhistbot.

So then we will get:

$ liftwingtoolforge build start https://gitlab.wikimedia.org/toolforge-repos/milhistbot-liftwing.git

...

[step-copy-stack-toml] 2024-01-09T05:07:46.534342021Z 2024/01/09 05:07:46 warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied
[step-detect] 2024-01-09T05:07:58.338943268Z Warning: no analyzed metadata found at path '/layers/analyzed.toml'
[step-detect] 2024-01-09T05:07:58.472013751Z 1 of 2 buildpacks participating
[step-detect] 2024-01-09T05:07:58.472067587Z jincod/dotnetcore-buildpack 7.0.401
[step-analyze] 2024-01-09T05:07:47.566757674Z 2024/01/09 05:07:47 warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied
[step-analyze] 2024-01-09T05:08:00.081363522Z Image with name "tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot:latest" not found
[step-restore] 2024-01-09T05:07:48.106234149Z 2024/01/09 05:07:48 warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied
[step-build] 2024-01-09T05:08:01.806268224Z > Installing dotnet
[step-build] 2024-01-09T05:08:01.806706176Z -----> Removing old cached .NET version
[step-build] 2024-01-09T05:08:01.820877591Z -----> Fetching .NET SDK
[step-build] 2024-01-09T05:08:40.535804953Z -----> Fetching .NET Runtime
[step-build] 2024-01-09T05:08:45.894805800Z -----> Export dotnet to Path
[step-build] 2024-01-09T05:08:46.043175424Z dirname: missing operand
[step-build] 2024-01-09T05:08:46.043262909Z Try 'dirname --help' for more information.
[step-build] 2024-01-09T05:08:46.057271485Z basename: missing operand
[step-build] 2024-01-09T05:08:46.057301154Z Try 'basename --help' for more information.
[step-build] 2024-01-09T05:08:46.057318630Z -----> Project File
[step-build] 2024-01-09T05:08:46.057326466Z >
[step-build] 2024-01-09T05:08:46.066083628Z ERROR: failed to build: exit status 1
[step-fix-permissions] 2024-01-09T05:08:46.129440157Z 2024/01/09 05:08:46 Skipping step because a previous step failed
[step-export] 2024-01-09T05:07:49.568421244Z 2024/01/09 05:07:49 warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied
[step-export] 2024-01-09T05:08:46.620142469Z 2024/01/09 05:08:46 Skipping step because a previous step failed
[step-results] 2024-01-09T05:08:47.040096938Z 2024/01/09 05:08:47 Skipping step because a previous step failed

I don't know what this stuff about tekton is about. Looks like a form of git authentication?

It's failing already when starting to build using dotnet, at that point the project has been cloned already, looking into the specific scripts of the upstream buildpack to see where it's failing to get the directory (the failure is the dirname: missing operand there).

I think that the issue is here: https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/dotnetcore-buildpack/-/blob/move_to_cnb/bin/compile?ref_type=heads#L65

The issue is related to the way those directory paths are initiated, I'll send a patch to reuse the shim that heroku uses instead of changing the code directly, will keep you posted

dcaro changed the task status from Open to In Progress.Jan 9 2024, 1:06 PM
dcaro claimed this task.
dcaro moved this task from Next Up to In Review on the Toolforge (Toolforge iteration 02) board.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/166

builds-builder: bump to 0.0.85-20240109170619-e4596900

@Hawkeye7 Just deployed a fix for that, can you try again?

Note that the compiled binaries are under heroku_output/<binaryname>, so for your procfile you will need to update your entries to web: heroku_output/Liftwing

Making some progress.

Build succeeded.

tools.milhistbot@tools-sgebastion-10:~/bin$ toolforge build show
Build ID: milhistbot-buildpacks-pipelinerun-x5lv7
Start Time: 2024-01-10T00:55:08Z
End Time: 2024-01-10T00:56:55Z
Status: ok
Message: Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0
Parameters:

Source URL: https://gitlab.wikimedia.org/toolforge-repos/milhistbot-liftwing.git
Ref: N/A
Envvars: N/A

Destination Image: tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot:latest

$ toolforge jobs run --image tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot:latest --command "web --help"
usage: toolforge jobs run [-h] --command COMMAND --image IMAGE [--no-filelog]

[-o FILELOG_STDOUT] [-e FILELOG_STDERR]
[--retry {0,1,2,3,4,5}] [--mem MEM] [--cpu CPU]
[--emails {none,all,onfinish,onfailure}]
[--mount {all,none}]
[--schedule SCHEDULE | --continuous | --wait [WAIT]]
name

toolforge jobs run: error: the following arguments are required: name

Hmm. That is not documented.

$ toolforge jobs run --image tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot:latest --command "web --help" liftwing
ERROR: Error: No such image 'tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot:latest'

Tried giving it a name:

$ toolforge build show
Build ID: milhistbot-buildpacks-pipelinerun-xcpft
Start Time: 2024-01-10T01:07:54Z
End Time: 2024-01-10T01:09:41Z
Status: ok
Message: Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0
Parameters:

Source URL: https://gitlab.wikimedia.org/toolforge-repos/milhistbot-liftwing.git
Ref: N/A
Envvars: N/A

Destination Image: tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest

tools.milhistbot@tools-sgebastion-10:~/bin$ toolforge jobs run --image tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot/liftwing:latest --command "web --help" liftwing
ERROR: Error: No such image 'tools-harbor.wmcloud.org/tool-milhistbot/tool-milhistbot/liftwing:latest

What is the image called?

What is the image called?

From your toolforge build show output, "Destination Image: tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest".

That's what it says, but it is giving me an error message saying it does not exist:

$ toolforge jobs run --image tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest --command "web --help" liftwing
ERROR: Error: No such image 'tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest'

@JJMC89 That seems correct.

Now I have an error:

tools.milhistbot@tools-sgebastion-10:~$ toolforge jobs run --image tool-milhistbot/liftwing:latest --command "web 'Hanford Engineer Works'" liftwing
tools.milhistbot@tools-sgebastion-10:~$ toolforge jobs logs liftwing
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 117, in _make_request
  response.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
  raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/jobs/liftwing/logs?follow=false

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/tjf_cli/api.py", line 47, in handle_http_exception
  json = original.response.json()
File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
  return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
  return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
  obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
  return self.scan_once(s, idx=_w(s, idx).end())

simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 787, in main
  run_subcommand(args=args, api=api)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 722, in run_subcommand
  op_logs(api, args.name, args.follow, args.last)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 527, in op_logs
  params=params,
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 167, in get_raw_lines
  **kwargs,
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 130, in _make_request
  raise self.exception_handler(e) from e
File "/usr/lib/python3/dist-packages/tjf_cli/api.py", line 59, in handle_http_exception
  except requests.exceptions.InvalidJSONError:

AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError'
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu

tools.milhistbot@tools-sgebastion-10:~$ cat milhistbot-liftwing/Procfile
web: heroku_output/Liftwing

tools.milhistbot@tools-sgebastion-10:~$ toolforge jobs run --image tool-milhistbot/liftwing:latest --mount=all --command "web 'Hanford Engineer Works'" liftwing --wait
ERROR: job 'liftwing' failed:
+-------------+-----------------------------------------------------------------------+

Job name:liftwing

+-------------+-----------------------------------------------------------------------+

Command:web 'Hanford Engineer Works'

+-------------+-----------------------------------------------------------------------+

Job type:normal

+-------------+-----------------------------------------------------------------------+

Image:tool-milhistbot/liftwing:latest

+-------------+-----------------------------------------------------------------------+

File log:no

+-------------+-----------------------------------------------------------------------+

Output log:

+-------------+-----------------------------------------------------------------------+

Error log:

+-------------+-----------------------------------------------------------------------+

Emails:none

+-------------+-----------------------------------------------------------------------+

Resources:default

+-------------+-----------------------------------------------------------------------+

Mounts:all

+-------------+-----------------------------------------------------------------------+

Retry:no

+-------------+-----------------------------------------------------------------------+

Status:Failed

+-------------+-----------------------------------------------------------------------+

Hints:Last run at 2024-01-10T03:02:37Z. Pod in 'Failed' phase. State
'terminated'. Reason 'ContainerCannotRun'. Started at
'2024-01-10T03:02:45Z'. Finished at '2024-01-10T03:02:45Z'. Exit code
'127'. Additional message:'failed to create shim task: OCI runtime
create failed: runc create failed: unable to start container process:
exec: "web": executable file not found in $PATH: unknown'.

+-------------+-----------------------------------------------------------------------+

For some reason that image did not pick up the Procfile properly, it should have generated a /cnb/lifecycle/web binary that's not there:

dcaro@urcuchillay$ podman run -ti --entrypoint /cnb/lifecycle/launcher --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest bash
...
heroku@dfd15b71d189:/workspace$ ls -la /cnb/platform
ls: cannot access '/cnb/platform': No such file or directory

Can you pass me the build logs?

As a workaround, you can try using the full path to the generated binary and skip the Procfile if you want, like --command "heroku_output/Liftwing 'Hanford Engineer Works'", though the program fails for me and it does not say much:

ools.jupytest@tools-sgebastion-10:~$ toolforge jobs run --image tool-milhistbot/liftwing:latest --mount=all --command "heroku_output/Liftwing" liftwing

tools.jupytest@tools-sgebastion-10:~$ toolforge jobs list
Job name:    Job type:    Status:
-----------  -----------  --------------
liftwing     normal       Running for 3s

tools.jupytest@tools-sgebastion-10:~$ toolforge jobs list
Job name:    Job type:    Status:
-----------  -----------  ---------
liftwing     normal       Failed

It fails for me and the binary does not really output much info (actually, none at all).

I can be sure it's running it because I can see some help when I run it without arguments (any argument, including -h, will just fail):

tools.jupytest@tools-sgebastion-10:~$ toolforge jobs run --image tool-milhistbot/liftwing:latest --mount=all --command "heroku_output/Liftwing" liftwing

tools.jupytest@tools-sgebastion-10:~$ toolforge jobs logs liftwing
2024-01-10T11:20:48+00:00 [liftwing-96mkr] usage: Liftwing <article>

Same locally btw., so it seems that the program is not being very verbose:

# shows help when not passing anything
dcaro@urcuchillay$ podman run -ti --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest heroku_output/Liftwing
usage: Liftwing <article>

# When passing any argument
dcaro@urcuchillay$ podman run -ti --entrypoint /cnb/lifecycle/launcher --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest heroku_output/Liftwing 'Hanford Engineer Works'
dcaro@urcuchillay$ echo $?
1

dcaro@urcuchillay$ podman run -ti --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest heroku_output/Liftwing --help
dcaro@urcuchillay$ echo $?
1

dcaro@urcuchillay$ podman run -ti --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest heroku_output/Liftwing -h
dcaro@urcuchillay$ echo $?
1

Note that there's an annoying bug that shows a full stack trace when trying to get the logs of the job before the job has run (that requests.exceptions.InvalidJSONError error), I'm opening a bug on that to fix it too.

I attach a copy of the build output.

I have no access to podman on your server, but I can run locally on my own.

ram900@ram900-test2:~/milhistbot-liftwing$ bin/Debug/net7.0/Liftwing
usage: Liftwing <article>
ram900@ram900-test2:~/milhistbot-liftwing$ bin/Debug/net7.0/Liftwing "Hanford Engineer Works"
FA

It needs to find its shared library (Wikimedia.dll) and config file (credx.xml). Both should be present in the build directory. But an error message will be put out if it cannot find them.

Could you try:

docker run -it --entrypoint /bin/bash liftwing

Then run it by hand:
Liftwing "Hanford Engineer Works"

And see what you get.

Aargh. Error on my part. I will check in a version that reports errors better.

For the Procfile, I think there's a bug and it's being ignored:

[step-inject-buildpacks] 2024-01-10T23:47:36.073122841Z [[order]]
[step-inject-buildpacks] 2024-01-10T23:47:36.073128604Z     [[order.group]]
[step-inject-buildpacks] 2024-01-10T23:47:36.073134282Z     id = "fagiani/apt"
[step-inject-buildpacks] 2024-01-10T23:47:36.073139971Z     version = "0.2.5"
[step-inject-buildpacks] 2024-01-10T23:47:36.073145803Z     optional = true
[step-inject-buildpacks] 2024-01-10T23:47:36.073151474Z     
[step-inject-buildpacks] 2024-01-10T23:47:36.073173778Z     [[order.group]]
[step-inject-buildpacks] 2024-01-10T23:47:36.073180342Z     id = "heroku/nodejs"
[step-inject-buildpacks] 2024-01-10T23:47:36.073190704Z     version = "2.6.2"
[step-inject-buildpacks] 2024-01-10T23:47:36.073197081Z     optional = true
[step-inject-buildpacks] 2024-01-10T23:47:36.073202758Z     
[step-inject-buildpacks] 2024-01-10T23:47:36.073208251Z     [[order.group]]
[step-inject-buildpacks] 2024-01-10T23:47:36.073213943Z     id = "jincod/dotnetcore-buildpack"
[step-inject-buildpacks] 2024-01-10T23:47:36.073219814Z     version = "7.0.401"
[step-inject-buildpacks] 2024-01-10T23:47:36.073225586Z     api = "0.10"
[step-inject-buildpacks] 2024-01-10T23:47:36.073231258Z     optional = false

The Procfile buildpack is not included in that order.group, probably a ordering issue when injecting the buildpacks, will open a task to look into it, should be easy to fix.

Aargh. Error on my part. I will check in a version that reports errors better.

I'm guessing you find the issue?

The library + config is there yes:

dcaro@urcuchillay$ podman run -ti --entrypoint /cnb/lifecycle/launcher --rm tools-harbor.wmcloud.org/tool-milhistbot/liftwing:latest bash
heroku@992aa69bbe46:/workspace$ ls -la
total 888
drwxrwxrwx. 1 heroku heroku    288 Jan  1  1980 .
dr-xr-xr-x. 1 root   root        6 Jan 11 09:48 ..
drwxrwxrwx. 1 heroku heroku    150 Jan  1  1980 .git
-rw-rw-rw-. 1 heroku heroku     10 Jan  1  1980 .gitignore
drwxrwxrwx. 1 heroku heroku     12 Jan  1  1980 .heroku
drwxrwxrwx. 1 heroku heroku     44 Jan  1  1980 .profile.d
-rw-rw-rw-. 1 heroku heroku    428 Jan  1  1980 Liftwing.csproj
-rw-rw-rw-. 1 heroku heroku 701992 Jan  1  1980 Newtonsoft.Json.dll
-rw-rw-rw-. 1 heroku heroku  22528 Jan  1  1980 Options.dll
-rw-rw-rw-. 1 heroku heroku     28 Jan  1  1980 Procfile
-rw-rw-rw-. 1 heroku heroku    664 Jan  1  1980 Program.cs
-rw-rw-rw-. 1 heroku heroku    245 Jan  1  1980 README.md
-rw-rw-rw-. 1 heroku heroku 152576 Jan  1  1980 Wikimedia.dll    <--- library
drwxrwxrwx. 1 heroku heroku     14 Jan  1  1980 bin
-rw-rw-rw-. 1 heroku heroku    588 Jan  1  1980 credx.xml        <--- config
drwxrwxrwx. 1 heroku heroku  10458 Jan  1  1980 heroku_output
drwxrwxrwx. 1 heroku heroku    276 Jan  1  1980 obj

It seems to be working now:

tools.milhistbot@tools-sgebastion-10:~/bin$ runit 'Hanford Engineer Works'
tools.milhistbot@tools-sgebastion-10:~/bin$ toolforge jobs logs liftwing
2024-01-11T23:43:24+00:00 [liftwing-ct566] FA

I find it strange that you are looking for the Program.cs file rather than the *.csproj file, which would seem more logical.

\o/

I find it strange that you are looking for the Program.cs file rather than the *.csproj file, which would seem more logical.

What the buildpack does is find the Program.cs, then, try to find a *.csproj file from that directory, kind of a long one-liner, https://github.com/jincod/dotnetcore-buildpack/blob/master/bin/compile#L49:

	PROJECT_FILE=$(x=$(dirname $(find ${BUILD_DIR} -maxdepth 5 -iname Startup.cs -o -iname Program.cs -o -iname Program.fs | head -1)); while [[ "$x" =~ $BUILD_DIR ]] ; do find "$x" -maxdepth 1 -name *.fsproj -o -name *.csproj; x=`dirname "$x"`; done)

Oh. Thanks for that. I am not in the habit of calling the main program Program.cs (the default) because I usually build more than one in the same solution. Having a separate project in toolforge means this is not an issue.

I am going to port one of the production systems (Liftwing was just a test.) My plan is to have it run in parallel with the old version for a while.

Thankls for all the work you have done. Much appreciated.

I'll close this then, but feel free to open another task with any issue you find 👍

dcaro moved this task from In Review to Done on the Toolforge (Toolforge iteration 03) board.