`webservice restart` isn't actually restarting the pods
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Legoktm
	Nov 3 2021, 6:34 AM

Description

tools.shorturls@tools-sgebastion-08:~/www/rust$ webservice restart
Your job is not running, starting...............
tools.shorturls@tools-sgebastion-08:~/www/rust$ webservice status
Your webservice of type golang111 is running on backend kubernetes
tools.shorturls@tools-sgebastion-08:~/www/rust$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
shorturls-768f6874cb-kgjvl   1/1     Running   0          21d

The job was definitely running before, even though it said it wasn't running.

Related Objects

Mentioned Here: T140415: `webservice restart` does not always wait for service to stop before trying to start again

Event Timeline

Legoktm created this task.Nov 3 2021, 6:34 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 3 2021, 6:34 AM

bd808 added a project: cloud-services-team (Kanban).Nov 3 2021, 2:33 PM

On IRC @LucasWerkmeister reminded me about https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/6Z3EXK3OOGZPHJZ6ZBSBYIYCVDGVMYDP/ - I'll try that later today to see if that was the issue.

Looking at https://k8s-status.toolforge.org/namespaces/tool-shorturls/, the deployment is 1y15w6d old. The pod is 18h39m28s old but it only has the tools.wmflabs.org tags, so this is indeed the problem mentioned in cloud-announce.

Yep:

tools.shorturls@tools-sgebastion-08:~$ webservice stop
Stopping webservice
tools.shorturls@tools-sgebastion-08:~$ webservice start
Starting webservice....
tools.shorturls@tools-sgebastion-08:~$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
shorturls-7b7858d794-jzxvw   1/1     Running   0          37s
tools.shorturls@tools-sgebastion-08:~$ kubectl get deployments
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
shorturls   1/1     1            1           45s
tools.shorturls@tools-sgebastion-08:~$ webservice restart
Restarting...
tools.shorturls@tools-sgebastion-08:~$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
shorturls-7b7858d794-lw6rj   1/1     Running   0          41s

Sorry about the trouble, thanks everyone.

Should we make a note on wikitech somewhere about this lurking issue? I'd bet that Kunal will not be the last person to stumble over it. I vaguely remembered that this would be an issue with Kubernetes selectors but had already forgotten the known issue.

Ideally we'd just patch webservice to realize this and automatically fully migrate to the new labels.

I'm not sure how good of an idea it would be to have webservice restart sometimes perform an (unexpected) stop/start. I do think some sort of warning message would be appropriate when the backend is in STATE_PENDING. See also the discussion in https://wm-bot.wmcloud.org/logs/%23wikimedia-cloud/20211013.txt

It might be useful to have a webservice restart --hard that just performs a stop/start in one command.

I think it is unfortunate that webservice restart has different semantics than webservice stop && webservice start, maybe that should be its own task.

I agree with either having webservice emit some help message or fixing it itself when it encounters this state.

In T294888#7482071, @AntiCompositeNumber wrote:

I'm not sure how good of an idea it would be to have webservice restart sometimes perform an (unexpected) stop/start.

At least IMO, I'd prefer that over having webservice restart not actually restart the service :S

In T294888#7482530, @Legoktm wrote:

I think it is unfortunate that webservice restart has different semantics than webservice stop && webservice start, maybe that should be its own task.

Having them be the same semantics would risk unnecessary reloads of the ingress layer (which would be a Problem) and make restarts more ugly for scaled up apps. Ideally, restarts would move to doing a rolling deployment instead in the future, don't you think? Unfortunately, the assumption that all labels would be stable only holds if we actually never change them. I just wish I'd known they couldn't be changed on the fly in the pod templates. Sorry!

restart originally was exactly the same as stop && start, but was changed to be the current behavior of deleting the pod and letting the replica set recreate it when using the Kubernetes backend because it is much lighter weight and also less buggy (T140415).

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 6:45 PM

fnegri moved this task from Kanban to Inbox on the cloud-services-team board.

I believe this was fixed in https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/commit/06e418c6952fcee64d2e63e6d70bdf0dd0c2cad5 and its follow-ups.

`webservice restart` isn't actually restarting the podsClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

`webservice restart` isn't actually restarting the pods
Closed, ResolvedPublic
Actions