Page MenuHomePhabricator

Check CI envs cleanup
Closed, DuplicatePublic

Description

We've recently had two incidents were a big number of CI envs were left behind and caused an outage:

For example, the pods left behind by the gerrit/Jenkins pipelines during the 2026-02-10 incident looked like:

$ kubectl get pods -n cat-env | grep mw-ext-wl-ci
mw-ext-wl-ci-1237677-66594-3942-mediawiki-66fcc5c64d-frm7l      0/4     Init:0/1                 1 (33m ago)    74m
mw-ext-wl-ci-1237681-33021-3941-mediawiki-b676cb884-rlhlc       0/4     Init:0/1                 1 (34m ago)    74m
mw-ext-wl-ci-1235081-88125-3806-py-evaluator-bf9468d78-ndkk6    1/1     Running                  0              11d
mw-ext-wl-ci-1235083-22643-3925-py-evaluator-79c7db8b6f-mftg5   1/1     Running                  0              16h
mw-ext-wl-ci-1235081-88125-3806-js-evaluator-76b79cb465-z4675   1/1     Running                  0              11d
mw-ext-wl-ci-1235081-88125-3806-artifact-warehouse              1/1     Running                  0              11d
mw-ext-wl-ci-1235082-33777-3924-js-evaluator-5846788778-b7lph   1/1     Running                  0              16h
mw-ext-wl-ci-1237878-46429-3926-py-evaluator-6c957c7b74-2cf9q   1/1     Running                  0              16h
mw-ext-wl-ci-1235082-33777-3924-artifact-warehouse              1/1     Running                  0              16h
mw-ext-wl-ci-1237677-28866-3921-artifact-warehouse              1/1     Running                  0              17h
mw-ext-wl-ci-1235081-21893-3923-js-evaluator-759d7958db-v2xtw   1/1     Running                  0              17h
mw-ext-wl-ci-1235082-33777-3924-mediawiki-b99f8864f-x9jjq       4/4     Running                  0              16h
mw-ext-wl-ci-1235081-88125-3806-mediawiki-664cd6dd9c-gmzhp      4/4     Running                  0              11d
mw-ext-wl-ci-1237681-22970-3922-mediawiki-6b6c6f8fbf-9kq2w      4/4     Running                  0              17h
mw-ext-wl-ci-1237677-28866-3921-py-evaluator-676b88795f-xk95z   1/1     Running                  0              17h
mw-ext-wl-ci-1235082-33777-3924-py-evaluator-54c8b78c4f-gwzlp   1/1     Running                  0              16h
mw-ext-wl-ci-1235083-22643-3925-artifact-warehouse              1/1     Running                  0              16h
mw-ext-wl-ci-1237681-33021-3941-artifact-warehouse              1/1     Running                  0              74m
mw-ext-wl-ci-1237677-28866-3921-js-evaluator-5f54ff6956-8xc25   1/1     Running                  0              17h
mw-ext-wl-ci-1235083-22643-3925-mediawiki-85679fff4-96wds       4/4     Running                  0              16h
mw-ext-wl-ci-1235081-21893-3923-artifact-warehouse              1/1     Running                  0              17h
mw-ext-wl-ci-1237677-28866-3921-mediawiki-84584cf6-pb5sc        4/4     Running                  0              17h
mw-ext-wl-ci-1237878-46429-3926-artifact-warehouse              1/1     Running                  0              16h
mw-ext-wl-ci-1237681-22970-3922-js-evaluator-68bfb6748d-kn95c   1/1     Running                  0              17h
mw-ext-wl-ci-1237878-46429-3926-mediawiki-764ddff94b-5xxnr      4/4     Running                  0              16h
mw-ext-wl-ci-1237681-22970-3922-artifact-warehouse              1/1     Running                  0              17h
mw-ext-wl-ci-1237677-66594-3942-artifact-warehouse              1/1     Running                  0              74m
mw-ext-wl-ci-1235081-88125-3806-mariadb-b95b69b6f-7r2xk         1/1     Running                  0              11d
mw-ext-wl-ci-1235081-21893-3923-mediawiki-76994c857d-jlfwf      4/4     Running                  0              17h
mw-ext-wl-ci-1237681-22970-3922-py-evaluator-798d459cd7-ssdp8   1/1     Running                  0              17h
mw-ext-wl-ci-1235081-21893-3923-py-evaluator-6ff54f78f-v2w6v    1/1     Running                  0              17h
mw-ext-wl-ci-1237878-46429-3926-py-evaluator-6c957c7b74-5c5jp   0/1     ContainerStatusUnknown   1              16h
mw-ext-wl-ci-1237878-46429-3926-mariadb-584f7b9db9-c6pxp        1/1     Running                  1 (4m3s ago)   16h
mw-ext-wl-ci-1235082-33777-3924-mariadb-84df66bb57-x6774        1/1     Running                  1 (4m3s ago)   16h
mw-ext-wl-ci-1237681-33021-3941-mariadb-855f6dc585-2gczb        1/1     Running                  2 (4m3s ago)   74m
mw-ext-wl-ci-1235081-21893-3923-mariadb-7d486f7b4f-dt66v        1/1     Running                  1 (4m3s ago)   17h
mw-ext-wl-ci-1237677-28866-3921-mariadb-6679bd67fc-kpg28        1/1     Running                  1 (4m3s ago)   17h
mw-ext-wl-ci-1237681-22970-3922-mariadb-57466d6cfc-v8dnl        1/1     Running                  1 (4m3s ago)   17h
mw-ext-wl-ci-1235083-22643-3925-mariadb-f6c6dd965-c4h8k         1/1     Running                  1 (4m3s ago)   16h
mw-ext-wl-ci-1237677-66594-3942-mariadb-866fb5fb7c-csqrx        1/1     Running                  2 (4m3s ago)   74m
mw-ext-wl-ci-1235083-22643-3925-js-evaluator-67f8859f5d-dp6db   1/1     Running                  1 (4m3s ago)   16h
mw-ext-wl-ci-1237681-33021-3941-py-evaluator-588c987569-86fvn   1/1     Running                  1 (4m3s ago)   74m
mw-ext-wl-ci-1237878-46429-3926-js-evaluator-6c9d55ff65-2rqv8   1/1     Running                  1 (4m3s ago)   16h
mw-ext-wl-ci-1237677-66594-3942-js-evaluator-74f9b596d9-6ffr2   1/1     Running                  1 (4m3s ago)   74m
mw-ext-wl-ci-1237677-66594-3942-py-evaluator-7686c6978b-47b98   1/1     Running                  1 (4m3s ago)   74m
mw-ext-wl-ci-1237681-33021-3941-js-evaluator-65b75678c4-d7q9t   1/1     Running                  1 (4m3s ago)   74m
$ kubectl get pods -n cat-env | grep mw-ext-wl-ci | wc -l
46

Now, the odd thing is that those environments are configured to always be removed regardless of the result of the pipeline, see here.

We should check the actual behavior and ensure the wikis are indeed being reaped unconditionally.