Page MenuHomePhabricator

Make startup script
Closed, DeclinedPublic

Description

Currently PAWS is started with a handful of commands described in the README.md investigate moving to a startup script that will do this for us.

This should largely be focused on the dev environment to simplify the deploy process for any new contributors. However it could be expanded to deploy to prod as well with some detection mech, probably shaped like detecting minikube and assuming prod if not. This part, however, may need some bits that allow us to point at other envs. Then again, perhaps not, we don't do that too much, and anyone who does is likely familiar enough to tinker with the script itself to suit their purpose.

Event Timeline

In other projects we are using the snippet:

# default to prod, avoid deploying dev in prod if there's any issue
local environment="tools"
if [[ "${1:-}" == "" ]]; then
    if [[ -f /etc/wmcs-project ]]; then
        environment="$(cat /etc/wmcs-project)"
    fi
else
    environment="${1:-}"
fi

That allows getting the current project from /etc/wmcs-project (available on any cloud VM) if none passed, and default to prod ("tools" in that case) if that file is not available, though that might not work in future magnum-generated VMs as it's added by puppet.

That allows getting the current project from /etc/wmcs-project (available on any cloud VM) if none passed, and default to prod ("tools" in that case) if that file is not available, though that might not work in future magnum-generated VMs as it's added by puppet.

In this case I think we should seek a different method. Ideally we can work to move away from having special k8s nodes, and towards a more cloud native approach of deployment, regardless of if magnum is used or not.

Ideally we can work to move away from having special k8s nodes, and towards a more cloud native approach of deployment

I'm not sure what you mean with this, can you elaborate?

I'm not sure what you mean with this, can you elaborate?

We make our deployments more ridged and complex, thus more prone to errors and less portable, as we add custom things to them. While this is true is most all contexts in the context of k8s you make trouble for yourself when you make custom k8s nodes. It increases the difficulty in doing things like horizontal autoscaling, as we can no longer have k8s request "another node" from where ever it requests them, it now has to go through a routine to get it built just for this particular cluster. This is not ideal, as it creates extra work, and creates more parts that can break. Additionally it is not portable. As it stands today we cannot pull up PAWS and put it on some other arbitrary k8s cluster, it would require the worker nodes mount nfs (being resolved in T321886), this is alongside a few other things, like the ingress is kind of weird, though less critically, for this, because it is outside of k8s, but that is the weird part, it shouldn't be separate from the cluster. Cloud native approaches have different meanings in different contexts, but in a k8s context part of what we want is for our application to be dropped on any k8s deploy and it should work, the k8s cluster is just a compute resource for our application, beyond that it deals with itself.

In essence a k8s application should not care how the underlying node that it is running on is configured.

In essence a k8s application should not care how the underlying node that it is running on is configured.

I think I might be missing the link between this and the deploy script itself (if there's one).

Is your point that we should not depend on that file (/etc/wmcs-project) being there?

If so, I agree :), that's why there's the option to override the environment manually on that snippet, and defaulting to the prod environment no matter what (effectively making it independent from that file, though it will use it if it's there).

That does not invalidate the standard itself though (https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T303931_k8s_standard_deployment_code_pattern), as it does not define the implementation, but just the api for the deploy script itself.

Well, there's the point of trying to use helmfile, that might help too in this case.

Is your point that we should not depend on that file (/etc/wmcs-project) being there?

yes

That does not invalidate the standard itself though (https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T303931_k8s_standard_deployment_code_pattern), as it does not define the implementation, but just the api for the deploy script itself.

I don't understand how the standard is an API, but my lack of comprehension seems immaterial in this manner. A startup.sh script is the focus of this ticket.

I don't understand how the standard is an API, but my lack of comprehension seems immaterial in this manner.

API as in being the specification that the standard deploy.sh script offers to other scripts/software/users to execute a deployment of that k8s application.

Bug: T322303

This is largely driven by https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T303931_k8s_standard_deployment_code_pattern

In this case this is looking like a solution in want of a problem. I suspect that the above decision was made with tools that have a more specific build environment in mind. PAWS is setup to be run from someone's own system, which could be setup in any way that they deem fit. This large unknown makes it difficult to make much sense of what we should and should not do with a startup script. As I wouldn't want to erode user trust by building a script that, for example, would reconstruct their minikube env, potentially breaking all kinds of things for them. As such that was left out of this script. Beyond that it remains questionable to me how much we can really benefit from this that the current instructions in the README wouldn't solve.

The primary benefit appears to be that this could limit the number of copy and paste actions that would be taken. Beyond that it doesn't seem like something that you would really want to use. The script won't fix much for you, so if you aren't familiar with kubectl or helm and they give an error, you would still be stuck figuring out what the problem with those services would be. Doing the additional work to get this script to be more useful in that front seems like a silly task considering how much work it would be, to, at best, save copying and pasting a few commands out of the README.