We've recently had two incidents were a big number of CI envs were left behind and caused an outage:
- https://wikitech.wikimedia.org/wiki/Catalyst/Incidents/2026-01-29
- https://wikitech.wikimedia.org/wiki/Catalyst/Incidents/2026-02-10
For example, the pods left behind by the gerrit/Jenkins pipelines during the 2026-02-10 incident looked like:
$ kubectl get pods -n cat-env | grep mw-ext-wl-ci mw-ext-wl-ci-1237677-66594-3942-mediawiki-66fcc5c64d-frm7l 0/4 Init:0/1 1 (33m ago) 74m mw-ext-wl-ci-1237681-33021-3941-mediawiki-b676cb884-rlhlc 0/4 Init:0/1 1 (34m ago) 74m mw-ext-wl-ci-1235081-88125-3806-py-evaluator-bf9468d78-ndkk6 1/1 Running 0 11d mw-ext-wl-ci-1235083-22643-3925-py-evaluator-79c7db8b6f-mftg5 1/1 Running 0 16h mw-ext-wl-ci-1235081-88125-3806-js-evaluator-76b79cb465-z4675 1/1 Running 0 11d mw-ext-wl-ci-1235081-88125-3806-artifact-warehouse 1/1 Running 0 11d mw-ext-wl-ci-1235082-33777-3924-js-evaluator-5846788778-b7lph 1/1 Running 0 16h mw-ext-wl-ci-1237878-46429-3926-py-evaluator-6c957c7b74-2cf9q 1/1 Running 0 16h mw-ext-wl-ci-1235082-33777-3924-artifact-warehouse 1/1 Running 0 16h mw-ext-wl-ci-1237677-28866-3921-artifact-warehouse 1/1 Running 0 17h mw-ext-wl-ci-1235081-21893-3923-js-evaluator-759d7958db-v2xtw 1/1 Running 0 17h mw-ext-wl-ci-1235082-33777-3924-mediawiki-b99f8864f-x9jjq 4/4 Running 0 16h mw-ext-wl-ci-1235081-88125-3806-mediawiki-664cd6dd9c-gmzhp 4/4 Running 0 11d mw-ext-wl-ci-1237681-22970-3922-mediawiki-6b6c6f8fbf-9kq2w 4/4 Running 0 17h mw-ext-wl-ci-1237677-28866-3921-py-evaluator-676b88795f-xk95z 1/1 Running 0 17h mw-ext-wl-ci-1235082-33777-3924-py-evaluator-54c8b78c4f-gwzlp 1/1 Running 0 16h mw-ext-wl-ci-1235083-22643-3925-artifact-warehouse 1/1 Running 0 16h mw-ext-wl-ci-1237681-33021-3941-artifact-warehouse 1/1 Running 0 74m mw-ext-wl-ci-1237677-28866-3921-js-evaluator-5f54ff6956-8xc25 1/1 Running 0 17h mw-ext-wl-ci-1235083-22643-3925-mediawiki-85679fff4-96wds 4/4 Running 0 16h mw-ext-wl-ci-1235081-21893-3923-artifact-warehouse 1/1 Running 0 17h mw-ext-wl-ci-1237677-28866-3921-mediawiki-84584cf6-pb5sc 4/4 Running 0 17h mw-ext-wl-ci-1237878-46429-3926-artifact-warehouse 1/1 Running 0 16h mw-ext-wl-ci-1237681-22970-3922-js-evaluator-68bfb6748d-kn95c 1/1 Running 0 17h mw-ext-wl-ci-1237878-46429-3926-mediawiki-764ddff94b-5xxnr 4/4 Running 0 16h mw-ext-wl-ci-1237681-22970-3922-artifact-warehouse 1/1 Running 0 17h mw-ext-wl-ci-1237677-66594-3942-artifact-warehouse 1/1 Running 0 74m mw-ext-wl-ci-1235081-88125-3806-mariadb-b95b69b6f-7r2xk 1/1 Running 0 11d mw-ext-wl-ci-1235081-21893-3923-mediawiki-76994c857d-jlfwf 4/4 Running 0 17h mw-ext-wl-ci-1237681-22970-3922-py-evaluator-798d459cd7-ssdp8 1/1 Running 0 17h mw-ext-wl-ci-1235081-21893-3923-py-evaluator-6ff54f78f-v2w6v 1/1 Running 0 17h mw-ext-wl-ci-1237878-46429-3926-py-evaluator-6c957c7b74-5c5jp 0/1 ContainerStatusUnknown 1 16h mw-ext-wl-ci-1237878-46429-3926-mariadb-584f7b9db9-c6pxp 1/1 Running 1 (4m3s ago) 16h mw-ext-wl-ci-1235082-33777-3924-mariadb-84df66bb57-x6774 1/1 Running 1 (4m3s ago) 16h mw-ext-wl-ci-1237681-33021-3941-mariadb-855f6dc585-2gczb 1/1 Running 2 (4m3s ago) 74m mw-ext-wl-ci-1235081-21893-3923-mariadb-7d486f7b4f-dt66v 1/1 Running 1 (4m3s ago) 17h mw-ext-wl-ci-1237677-28866-3921-mariadb-6679bd67fc-kpg28 1/1 Running 1 (4m3s ago) 17h mw-ext-wl-ci-1237681-22970-3922-mariadb-57466d6cfc-v8dnl 1/1 Running 1 (4m3s ago) 17h mw-ext-wl-ci-1235083-22643-3925-mariadb-f6c6dd965-c4h8k 1/1 Running 1 (4m3s ago) 16h mw-ext-wl-ci-1237677-66594-3942-mariadb-866fb5fb7c-csqrx 1/1 Running 2 (4m3s ago) 74m mw-ext-wl-ci-1235083-22643-3925-js-evaluator-67f8859f5d-dp6db 1/1 Running 1 (4m3s ago) 16h mw-ext-wl-ci-1237681-33021-3941-py-evaluator-588c987569-86fvn 1/1 Running 1 (4m3s ago) 74m mw-ext-wl-ci-1237878-46429-3926-js-evaluator-6c9d55ff65-2rqv8 1/1 Running 1 (4m3s ago) 16h mw-ext-wl-ci-1237677-66594-3942-js-evaluator-74f9b596d9-6ffr2 1/1 Running 1 (4m3s ago) 74m mw-ext-wl-ci-1237677-66594-3942-py-evaluator-7686c6978b-47b98 1/1 Running 1 (4m3s ago) 74m mw-ext-wl-ci-1237681-33021-3941-js-evaluator-65b75678c4-d7q9t 1/1 Running 1 (4m3s ago) 74m $ kubectl get pods -n cat-env | grep mw-ext-wl-ci | wc -l 46
Now, the odd thing is that those environments are configured to always be removed regardless of the result of the pipeline, see here.
We should check the actual behavior and ensure the wikis are indeed being reaped unconditionally.