Page MenuHomePhabricator

mwscript-k8s --attach error: TypeError: 'NoneType' object is not iterable
Closed, ResolvedPublic

Description

Happened during deployment window today:

lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging $ mwscript-k8s --attach namespaceDupes mswikisource
⏳ Starting namespaceDupes on Kubernetes...
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/mw-script/eqiad.yaml"
skipping missing values file matching "values-eqiad.yaml"
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/mw-script/eqiad.yaml"
skipping missing values file matching "values-eqiad.yaml"
Release "lgp1hy2b" does not exist. Installing it now.
NAME: lgp1hy2b
LAST DEPLOYED: Wed Jul  3 13:12:48 2024
NAMESPACE: mw-script
STATUS: deployed
REVISION: 1
NOTES:


lgp1hy2b        mw-script       1               2024-07-03 13:12:48.088005727 +0000 UTC deployed        mediawiki-0.6.35

Traceback (most recent call last):
  File "/usr/local/bin/mwscript-k8s", line 264, in <module>
    sys.exit(main())
  File "/usr/local/bin/mwscript-k8s", line 237, in main
    wait_until_started(env_vars, job, container)
  File "/usr/local/bin/mwscript-k8s", line 111, in wait_until_started
    if pod_list.items and is_started(pod_list.items[0], container):
  File "/usr/local/bin/mwscript-k8s", line 100, in is_started
    for container_status in pod.status.container_statuses:
TypeError: 'NoneType' object is not iterable

As there were a lot of changes to deploy, I didn’t investigate yet, but just ran the script on mwmaint1002 instead.

Event Timeline

As there were a lot of changes to deploy, I didn’t investigate yet, but just ran the script on mwmaint1002 instead.

And by now it looks like everything was cleaned up in the meantime. But the issue is still reproducible:

lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --attach namespaceDupes mswikisource
⏳ Starting namespaceDupes on Kubernetes...
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/mw-script/eqiad.yaml"
skipping missing values file matching "values-eqiad.yaml"
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/mw-script/eqiad.yaml"
skipping missing values file matching "values-eqiad.yaml"
Release "yizd4gwh" does not exist. Installing it now.
NAME: yizd4gwh
LAST DEPLOYED: Wed Jul  3 14:33:51 2024
NAMESPACE: mw-script
STATUS: deployed
REVISION: 1
NOTES:


yizd4gwh	mw-script	1       	2024-07-03 14:33:51.116552689 +0000 UTC	deployed	mediawiki-0.6.35	           

Traceback (most recent call last):
  File "/usr/local/bin/mwscript-k8s", line 264, in <module>
    sys.exit(main())
  File "/usr/local/bin/mwscript-k8s", line 237, in main
    wait_until_started(env_vars, job, container)
  File "/usr/local/bin/mwscript-k8s", line 111, in wait_until_started
    if pod_list.items and is_started(pod_list.items[0], container):
  File "/usr/local/bin/mwscript-k8s", line 100, in is_started
    for container_status in pod.status.container_statuses:
TypeError: 'NoneType' object is not iterable
lucaswerkmeister-wmde@deploy1002 ~ $ kubectl get pods
NAME                             READY   STATUS      RESTARTS   AGE
mw-script.eqiad.yizd4gwh-twtj5   0/4     Completed   0          6s
lucaswerkmeister-wmde@deploy1002 ~ $ kubectl logs mw-script.eqiad.yizd4gwh-twtj5
error: a container name must be specified for pod mw-script.eqiad.yizd4gwh-twtj5, choose one of: [mediawiki-yizd4gwh-app mediawiki-yizd4gwh-mcrouter mediawiki-yizd4gwh-tls-proxy mediawiki-yizd4gwh-rsyslog]
lucaswerkmeister-wmde@deploy1002 ~ $ kubectl logs mw-script.eqiad.yizd4gwh-twtj5 mediawiki-yizd4gwh-app
0 pages to fix, 0 were resolvable.

0 links to fix, 0 were resolvable, 0 were deleted.

Looks good!

It looks like the problem here is just that namespaceDupes exists exits (successfully, in this case) before mwscript-k8s can attach to it, and that confuses mwscript-k8s.

namespaceDupes is a dry-run by default, by the way, so if you’re working on this issue, you can try this out as often as you want so long as you don’t add --fix to the options. (namespaceDupes with --fix is still pretty harmless, but it should at least be logged.)

RLazarus subscribed.

Thanks for the report! It's actually not because of the successful exit; the script handles that.

Rather, it turns out the pod was Pending when we first checked on it, and container_statuses was None (rather than an empty list, as the API docs led me to believe). This didn't come up in my own testing, I appreciate you finding it!

I'll have a fix out shortly.

Change #1051855 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_server: Handle None container_statuses in mwscript-k8s

https://gerrit.wikimedia.org/r/1051855

Change #1051855 merged by RLazarus:

[operations/puppet@production] deployment_server: Handle None container_statuses in mwscript-k8s

https://gerrit.wikimedia.org/r/1051855

This is fixed, thanks again for testing!