Page MenuHomePhabricator

[tbs][cli] Error when there's no 'steps' in the status
Closed, ResolvedPublic3 Estimated Story Points

Description

When running toolforge build-show sometimes the status does not include the 'steps' and the cli exits with KeyError:

toolsbeta.test@toolsbeta-sgebastion-05:~$ toolforge build-show test-buildpacks-pipelinerun-7h7c7
Name: test-buildpacks-pipelinerun-7h7c7
Start time: 2022-09-27T08:09:22Z
End time: running
Status: ok(Running)
...
        Steps:
            Step: clone - unknown({'container': 'step-clone', 'imageID': 'docker-pullable://docker-registry.tools.wmflabs.org/toolforge-tektoncd-pipeline-cmd-git-init@sha256:bbb7ec459178f708644694417f9426e6c599b5aaab1b01e4c3863e8294de6758', 'name': 'clone', 'terminated': {'containerID': 'docker://1e41000998d695044380fa50748a8309b742380ddb6cc1350d7cb29bc5001c06', 'exitCode': 0, 'finishedAt': '2022-09-27T08:09:43Z', 'message': '[{"key":"commit","value":"a03aa93d58961f09e63e29c0c4105ad69945f8ed","resourceRef":{}},{"key":"url","value":"https://github.com/david-caro/wm-lol.git","resourceRef":{}}]', 'reason': 'Completed', 'startedAt': '2022-09-27T08:09:43Z'}})
            ...

toolsbeta.test@toolsbeta-sgebastion-05:~$ toolforge build-show test-buildpacks-pipelinerun-7h7c7
Traceback (most recent call last):
  File "/usr/bin/toolforge", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/toolforge_cli/cli.py", line 438, in main
    toolforge()
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/toolforge_cli/cli.py", line 409, in build_show
    click.echo(_run_to_details_str(run=run, k8s_client=k8s_client))
  File "/usr/lib/python3/dist-packages/toolforge_cli/cli.py", line 167, in _run_to_details_str
    details_str += "\n".join("    " + line for line in _get_task_details_lines(run=run, k8s_client=k8s_client))
  File "/usr/lib/python3/dist-packages/toolforge_cli/cli.py", line 131, in _get_task_details_lines
    tasks_details_lines.extend("        " + line for line in _get_step_details_lines(task=task))
  File "/usr/lib/python3/dist-packages/toolforge_cli/cli.py", line 104, in _get_step_details_lines
    for step in task["status"]["steps"]:
KeyError: 'steps'

The actual status has the keys:

['completionTime', 'conditions', 'podName', 'startTime', 'taskSpec']

The conditions hold the reason why it failed:

"conditions": [
    {
        "lastTransitionTime": "2022-09-27T08:09:58Z",
        "message": "The node was low on resource: memory. Container step-export was using 7804Ki, which exceeds its request of 0. Container step-results was using 6756Ki, which exceeds its request o f 0. Container step-build was using 26352Ki, which exceeds its request of 0. ",
        "reason": "Failed",
        "status": "False",
        "type": "Succeeded"
    }
]

Event Timeline

@dcaro Any suggestion on how I could reproduce this error locally?

You can still access to the build that failed (to try the cli itself) at that host, you can just sudo -i -u dcaro to test.

For triggering that same error on k8s I'll have to investigate xd, but probably tweaking the resource limits should give you that error message.

Let if you want help to reproduce from k8s itself instead of reusing the old build entry if you want to go that way, can be fun 👍

Let if you want help to reproduce from k8s itself instead of reusing the old build entry if you want to go that way, can be fun 👍

Yes, reproducing it locally from k8s itself would be my preferred way to go. But if you think that may be unnecessarily complicated, I can just test my code on toolsbeta. Is that doable without redeploying?

Alternatively, is there a way I could get the underlying pipeline json object to use as test data?

Yes, reproducing it locally from k8s itself would be my preferred way to go. But if you think that may be unnecessarily complicated, I can just test my code on toolsbeta. Is that doable without redeploying?

You can create a venv and clone your code with modifications there (or copy over the files).

Some hints to reproduce locally:

You will want to change the memory limit for the containers that the pipeline starts, and for now the only way to do so that is non alpha status (according to the tekton docs) is using a namespace LimitRange (generic k8s, and for tekton), so you'd have to create one of those in the image-build namesapce with very low memory limit so the pods started by the TaskRun get killed.

It's really up to you to try or not, if you are not interested on learning k8s/tekton internals it's probably not worth it, if you are, it's definitely worth it :)

Alternatively, is there a way I could get the underlying pipeline json object to use as test data?

For this, you can the whole pipelinerun object:

dcaro@toolsbeta-sgebastion-05:~$ kubectl -n image-build get pipelinerun -o json test-buildpacks-pipelinerun-7h7c7

And from there, following a bit the code (that json is like an onion, lots of layers xd) you will end that the status object it's failing to get the steps key from:

dcaro@toolsbeta-sgebastion-05:~$ kubectl -n image-build get pipelinerun -o json test-buildpacks-pipelinerun-7h7c7 | jq '.status.taskRuns["test-buildpacks-pipelinerun-7h7c7-build-from-git"].status'

You can probably get the full PipelineRun json tohugh, and inject it where it assigns the run object directly for testing.

@dcaro Thank you, this is very helpful! I'm leaning towards spending some time poking around in k8s/tekton because I am interested in learning what goes on under the hood, but then also adding a few unit tests with the pipeline json as a fixture.

Remember though that this is going to be rewritten soon-ish though

Change 855503 had a related patch set uploaded (by Slavina Stefanova; author: Slavina Stefanova):

[cloud/toolforge/toolforge-cli@main] cli: build-show fails for tasks with no steps

https://gerrit.wikimedia.org/r/855503

Slst2020 changed the task status from In Progress to Open.Nov 10 2022, 10:20 AM

Change 855503 merged by jenkins-bot:

[cloud/toolforge/toolforge-cli@main] cli: build-show fails for tasks with no steps

https://gerrit.wikimedia.org/r/855503