Page MenuHomePhabricator

stewardbot k8s pod fails to restart due to an internal server error
Closed, ResolvedPublic

Description

When trying to restart StewardBot's k8s pod, I received the following internal server error:

tools.stewardbots@tools-sgebastion-10:~/stewardbots/StewardBot$ ./manage.sh restart
Restarting StewardBot pod...
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 71, in _make_request
    response.raise_for_status()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/restart/stewardbot

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 712, in main
    run_subcommand(args=args, api=api)
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 659, in run_subcommand
    op_restart(api, args.name)
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 586, in op_restart
    api.post(f"/restart/{name}")
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 95, in post
    return self._make_request("POST", url, **kwargs).json()
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 75, in _make_request
    raise self.exception_handler(e)
tjf_cli.api.TjfCliHttpError: Internal Server Error
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
tools.stewardbots@tools-sgebastion-10:~/stewardbots/StewardBot$ cat manage.sh 
#!/usr/bin/env bash
# Management script for <del>stashbot</del> StewardBot kubernetes processes
# https://github.com/wikimedia/stashbot/blob/master/bin/stashbot.sh

set -e

TOOL_DIR=/data/project/stewardbots/stewardbots/StewardBot
JOB_NAME=stewardbot
JOB_FILE="${TOOL_DIR}/jobs.yaml"
LOG_FILE="/data/project/stewardbots/logs/stewardbot.log"
VENV=/data/project/stewardbots/venv-k8s-py39

case "$1" in
    start)
        echo "Starting StewardBot k8s deployment..."
        toolforge-jobs load "${JOB_FILE}" --job "${JOB_NAME}"
        ;;
    run)
        date +%Y-%m-%dT%H:%M:%S
        echo "Starting StewardBot..."
        source ${VENV}/bin/activate
        cd ${TOOL_DIR}
        exec python StewardBot.py
        ;;
    stop)
        echo "Stopping StewardBot k8s deployment..."
        toolforge-jobs delete "${JOB_NAME}"
        # FIXME: wait for the pods to stop
        ;;
    restart)
        echo "Restarting StewardBot pod..."
        toolforge-jobs restart "${JOB_NAME}"
        ;;
    status)
        toolforge-jobs show "${JOB_NAME}"
        ;;
    tail)
        exec tail -f "${LOG_FILE}"
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status|tail}"
        exit 1
        ;;
esac

exit 0
# vim:ft=sh:sw=4:ts=4:sts=4:et:
tools.stewardbots@tools-sgebastion-10:~/stewardbots/StewardBot$

Reporting to Toolforge admins, as the command told me so. FWIW, ./manage.sh stop && ./manage.sh start seems to have worked properly.

Event Timeline

This is an unattributed fork of the problem I reported HERE

<syntaxhighlight lang=shell-session>
tools.billsbots@tools-sgebastion-10:~$ toolforge-jobs restart refreshlinks
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 71, in _make_request
  response.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
  raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/restart/refreshlinks

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 712, in main
  run_subcommand(args=args, api=api)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 659, in run_subcommand
  op_restart(api, args.name)
File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 586, in op_restart
  api.post(f"/restart/{name}")
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 95, in post
  return self._make_request("POST", url, **kwargs).json()
File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 75, in _make_request
  raise self.exception_handler(e)

tjf_cli.api.TjfCliHttpError: Internal Server Error
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
tools.billsbots@tools-sgebastion-10:~$
</syntaxhighlight>

ERROR: Please report this issue to the Toolforge admins

Who are the Toolforge admins? How many of them are there? Are they all volunteers or is at least one a WMF employee?

Can one of them at least acknowledge their awareness of this issue so I know I'm not still shouting into the wind? Thanks.

ERROR: Please report this issue to the Toolforge admins

Who are the Toolforge admins? How many of them are there? Are they all volunteers or is at least one a WMF employee?

Can one of them at least acknowledge their awareness of this issue so I know I'm not still shouting into the wind? Thanks.

Hi! I'm a Toolforge admin, a WMF employee. This problem is being looked into.

Sorry for the inconvenience.

Mentioned in SAL (#wikimedia-cloud) [2023-06-30T18:21:19Z] <taavi> deploy new jobs-api release to fix T340829

taavi claimed this task.

Sorry, it's too much trouble for me to spend an hour trying to figure out how to open a new Phab, so I'm just gonna reuse this one.

Please see my posted console log

That's mostly Greek to me but hopefully someone can figure it out.

JJMC89 subscribed.

I'm just gonna reuse this one.

Don't do that. The issues are unrelated.