Page MenuHomePhabricator

[cert-manager] Pods are not being restarted after the certificate renewal
Closed, ResolvedPublic

Description

I've seen it both in toolsbeta and tools, when the certs get renewed the pods for the builds-api and envvars-api don't get restarted.

That ends up on the gateway returing 502:

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.0</center>
</body>
</html>

As it's getting invalid certs from the upstream:

2023/09/12 08:54:04 [error] 23#23: *134120 upstream SSL certificate verify error: (10:certificate has expired) while SSL handshaking to upstream, client: 192.168.247.64, server: , request: "POST /builds/v1/build HTTP/1.1", upstream: "https://10.98.19.138:8443/v1/build", host: "api.svc.tools.eqiad1.wikimedia.cloud:30003"

There might be something wrong in the way we set the certs or the labels for the autorestart or something :/

Details

TitleReferenceAuthorSource BranchDest Branch
envvars-api: bump to 0.0.30-20231004072407-3896abb0repos/cloud/toolforge/toolforge-deploy!106dcarobump_envvars-apimain
helm: delete unused certificaterepos/cloud/toolforge/envvars-api!14dcaroremove_duplicated_certmain
certs: remove unused certrepos/cloud/toolforge/builds-api!49dcaroremove_unused_certmain
Customize query in GitLab

Event Timeline

dcaro triaged this task as High priority.Sep 12 2023, 9:04 AM
dcaro created this task.
dcaro added a project: Toolforge.
dcaro edited projects, added Toolforge (Toolforge iteration 00); removed Toolforge.

This is because we have two certs in the envvars-api and builds-api charts, and only reloading for one of them.

dcaro changed the task status from Open to In Progress.Sep 26 2023, 12:34 PM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 00) board.
dcaro moved this task from In Progress to Done on the Toolforge (Toolforge iteration 00) board.