Page MenuHomePhabricator

Improve deployment-charts documentation for non specialist Kubernetes users
Open, Needs TriagePublic

Description

As someone who doesn't use kubernetes day-to-day, but who is on a team maintaining a service (Add-Link) deployed via kubernetes, there are a few things that are confusing about how to work with WMF's stack:

  1. should I be using local-charts or minikube directly with the deployment-charts repo?
  2. if I'm working on something that doesn't have a published docker image yet (e.g. a non merged patch), IIRC I need to specify docker context so the image is built within minikube, but I'm not sure this is documented?
  3. There are several wikitech docs that refer to commands that don't work with helm version 3 (IIRC, "helm init" is one of them). In general I have found the wikitech docs difficult, from my perspective as a non SRE person who is trying to work with these tools to ship code for my team's service -- I know, it's a wiki, I can edit it, but as someone without expertise here I'm probably not the best one to do that. Maybe canonical, up-to-date documentation for how you can test a patch locally with minikube could be added to the operations/deployment-charts repo?

Event Timeline

I may have contributed to your confusion by adding local-charts to the Deployment Pipeline Migration Tutorial. If one needed to use a db like Mariadb while testing their k8s service via helm chart or integrate with other charts, it would be useful, but it isn't necessary to use it to work on your service. I will update the tutorial to express that more directly.

There are several wikitech docs that refer to commands that don't work with helm version 3 (IIRC, "helm init" is one of them)

Could you please link to them? I only found 1 (https://wikitech.wikimedia.org/wiki/User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps).

Btw, helm 3 isn't yet supported in the pipeline, that's why. I 've updated that page to point that out, but it is in need of a general refresh and a move under the main namespace. Chances are that a revamp of that page and a merge with https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial could help a lot.

I am not against making the docs better (on the contrary), but since you say that it feels to you like it is tailored for SREs (we 've gone into lengths to NOT have that, SRE destined docs would be way different) it might mean we need a fresh eye. If you can point us to the things you felt they were problematic we can work together to get them fixed.

Change 661175 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[releng/local-charts@master] Install Helm v2

https://gerrit.wikimedia.org/r/661175

There are several wikitech docs that refer to commands that don't work with helm version 3 (IIRC, "helm init" is one of them)

Could you please link to them? I only found 1 (https://wikitech.wikimedia.org/wiki/User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps).

Um, you're right, sorry for not being precise here. I think as you've pointed out, the issue is that helm 3 is not compatible in various ways, and that's the root of my confusion. I made things worse for myself by linking to https://github.com/thesocialdev/mediawiki-services-profiler from your benchmarking page, and that repo also references helm init but not to install helm 2 instead of helm 3.

Btw, helm 3 isn't yet supported in the pipeline, that's why. I 've updated that page to point that out, but it is in need of a general refresh and a move under the main namespace. Chances are that a revamp of that page and a merge with https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial could help a lot.

+1

I am not against making the docs better (on the contrary), but since you say that it feels to you like it is tailored for SREs (we 've gone into lengths to NOT have that, SRE destined docs would be way different) it might mean we need a fresh eye. If you can point us to the things you felt they were problematic we can work together to get them fixed.

Having seen the comments here and thought about what I've seen, maybe just pointing more directly to local-charts as the tool to use for testing out deployments.

The specific example of what I am trying to do is, test out this change so I have some confidence in that what I am asking SRE to merge might actually work in the way I intend it to. I'll have a look through again at the documentation as I figure out the process to do that (figure out how to get local-charts set up with the database, the correct environment variables injected, the latest Docker image built from the patch which the deployment-charts patch depends on, and finally deploying the cron job in my minikube env), and see what if anything I can contribute or suggest.

Change 661175 merged by Jeena Huneidi:
[releng/local-charts@master] Install Helm v2

https://gerrit.wikimedia.org/r/661175

The specific example of what I am trying to do is, test out this change so I have some confidence in that what I am asking SRE to merge might actually work in the way I intend it to. I'll have a look through again at the documentation as I figure out the process to do that (figure out how to get local-charts set up with the database, the correct environment variables injected, the latest Docker image built from the patch which the deployment-charts patch depends on, and finally deploying the cron job in my minikube env), and see what if anything I can contribute or suggest.

Cool! Thanks for helping with that!!!

First roadblock, not specific to link recommendation, but in case others run across it:

When I run minikube start I get

    ❗ Enabling 'default-storageclass' returned an error: running callbacks: [Error making standard the default storage class: Error listing StorageClasses: Get "https://192.168.49.2:8443/apis/storage.k8s.io/v1/storageclasses": dial tcp 192.168.49.2:8443: i/o timeout]
    🌟 Enabled addons: storage-provisioner

❌ Exiting due to GUEST_START: wait 6m0s for node: wait for healthy API server: apiserver healthz never reported healthy: timed out waiting for the condition

I shut down Minikube and I'm now attempting to use the Kubernetes support in Docker for Mac.

I'm using the config that @longma so helpfully created for me a few months ago (thank you!) at https://gerrit.wikimedia.org/r/c/releng/local-charts/+/639347

When I run helm repo update I get:

❯ helm repo update
WARNING: "kubernetes-charts.storage.googleapis.com" is deprecated for "stable" and will be deleted Nov. 13, 2020.
WARNING: You should switch to "https://charts.helm.sh/stable"
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Unable to get an update from the "stable" chart repository (https://kubernetes-charts.storage.googleapis.com):
	Failed to fetch https://kubernetes-charts.storage.googleapis.com/index.yaml : 403 Forbidden
Update Complete.

OK. Looking at helm/requirements.yaml I see that mariadb uses that repo, so I try switching it to the recommended stable chart repo, but then:

 make deploy
helm dependency update ./helm
Hang tight while we grab the latest from your chart repositories...
...Unable to get an update from the "local" chart repository (http://127.0.0.1:8879/charts):
	Get "http://127.0.0.1:8879/charts/index.yaml": dial tcp 127.0.0.1:8879: connect: connection refused
...Successfully got an update from the "wikimedia" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete.
Saving 5 charts
Downloading mariadb from repo https://kubernetes-charts.storage.googleapis.com/
Save error occurred:  could not find : chart mariadb not found in https://kubernetes-charts.storage.googleapis.com/
Deleting newly downloaded charts, restoring pre-update state
Error: could not find : chart mariadb not found in https://kubernetes-charts.storage.googleapis.com/
make: *** [deploy] Error 1

I ended up figuring out the issues with the charts, I'll submit patches to local-charts to update some of the defaults to fix it.

As for testing out link recommendation service patches with local-charts, I created https://wikitech.wikimedia.org/wiki/User:Kosta_Harlan/Add_Link_Deployment_Charts_Notes

@akosiaris @thcipriani @dduvall @JMeybohm

As start to do annual planning, can we work on a timeline for an official process to build and deploy a new k8s based service using Deployment Pipeline and deployment-charts? There are gerrit specific docs, but the Getting Deployed to Production section is empty.

And/or maybe this is all planned as part of some RelEng work?