Page MenuHomePhabricator

Ensure webservice plays nicely with Toolforge tools using buildpack images
Open, MediumPublic

Description

At a very minimum, we need to ensure that running eventually legacy webservice start ... commands doesn't clobber a buildpack-based deployment. It should look to see whether the managed-by label is "tfb" and if so, refuse to take action.

It would be good if webservice status would output that a buildpack web server is running instead of saying there's no running webservice.

Open question: we probably need webservice restart/stop CLI functionality for buildpack web servers. Should we just reuse webservice for that or have a different set of CLI tooling?

Event Timeline

I suppose if we have a CLI method to stop a buildpack web server, then we should also have one to start a buildpack web server. I don't really want to duplicate the k8s deployment config/logic into a second tool though. Maybe webservice start can hit the CD service???

The goal down the road is to make it so there is not need for or even access to a bastion for login...maybe with some way for a person to interact with a webUI or a CLI that doesn't run on a bastion by requirement. (It's a nice dream).

That said, the main thing to worry about, I think, is the main ClusterIP service object and the Ingress that talks to it. By default, your quota allows one service object, and if you use webservice, it has webservice labels on it. The labels prevent clobbering since webservice deletes and manages whatever includes its labels.

We might need to bake in a way to allow people to switch *back* to using webservice...like let webservice know about the buildpack labels and how to completely remove the objects with a particular, deliberate set of args? I'm really iffy about letting webservice talk to CI at all, but If webservice could delete the things that CI creates, then you just need to trigger a new build/deploy (which may not need to include the buildpack phase if there were no changes to ____) to redeploy from scratch. You can do all that with kubectl, but then everyone needs to know k8s details, which is the opposite of good imho 😛

The goal down the road is to make it so there is not need for or even access to a bastion for login...maybe with some way for a person to interact with a webUI or a CLI that doesn't run on a bastion by requirement. (It's a nice dream).

Gotcha. I'll move having manual start/restart/stop functionality to a separate ticket.

That said, the main thing to worry about, I think, is the main ClusterIP service object and the Ingress that talks to it. By default, your quota allows one service object, and if you use webservice, it has webservice labels on it. The labels prevent clobbering since webservice deletes and manages whatever includes its labels.

So I think both buildpacks CD and webservice should check to see if an object already exists with a different managed-by label and then refuse to continue if that's the case.

We might need to bake in a way to allow people to switch *back* to using webservice...like let webservice know about the buildpack labels and how to completely remove the objects with a particular, deliberate set of args? I'm really iffy about letting webservice talk to CI at all, but If webservice could delete the things that CI creates, then you just need to trigger a new build/deploy (which may not need to include the buildpack phase if there were no changes to ____) to redeploy from scratch. You can do all that with kubectl, but then everyone needs to know k8s details, which is the opposite of good imho 😛

So once we have a "stop a builpack web server" button/CLI option, I think for migration purposes it's fine to ask people to manually stop their webservice and then switch to buildpacks and then manually stop your buildpack web server and switch back to webservice. I think if we try to implement logic that one tool can trump the other it's just going to get confusing.

So I think both buildpacks CD and webservice should check to see if an object already exists with a different managed-by label and then refuse to continue if that's the case.

Yeah, that sounds like a good default.

So once we have a "stop a builpack web server" button/CLI option, I think for migration purposes it's fine to ask people to manually stop their webservice and then switch to buildpacks and then manually stop your buildpack web server and switch back to webservice. I think if we try to implement logic that one tool can trump the other it's just going to get confusing.

+1 on people manually cleaning up their webservices. Manually stopping a buildpack web server on the other hand would include deleting a collection of at least four objects, though (accepting that the pods will actually cascade-delete). I think it might be reasonable to include some option to use webservice to clean it all up before we have a better way (like a web thingy or something), but maybe require --force-clean type CLI args but should otherwise fail if the sevice object is managed-by the other workflow. Something like how ceph has some options guarded behind --i-really-mean-it :)

Beyond that, it would be best to let something other than a bona fide push trigger a full redeploy. Kind of like how Jenkins responds to "recheck" comments in gerrit with a build. It's tricky to wonder about that overall because we don't have the CI setup in place yet. Many of them can easily do that from a webUI, but the workflows need to be somehow predefined if we expose such a thing to users. We are coming around to the point where figuring out what that is going to be might be important.

Change 638208 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/software/tools-webservice@master] Prevent webservice from doing anything if buildpacks are being used

https://gerrit.wikimedia.org/r/638208

Andrew triaged this task as Medium priority.Dec 8 2020, 5:52 PM
Andrew moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.