Page MenuHomePhabricator

Helm test failing for CI namespace
Closed, ResolvedPublic

Description

Trying to get rid of minikube and move to using the CI namespace setup in k8s in T196654

I have managed to get it to the point where I'm able to deploy, via the pipeline, to the CI namespace; however helm test fails:

$ KUBECONFIG=/etc/kubernetes/ci-staging.config helm --tiller-namespace=ci test --cleanup mathoid-20180712215637-candidate
RUNNING: mathoid-mathoid-20180712215637-candidate-service-checker
FAILED: mathoid-mathoid-20180712215637-candidate-service-checker, run `kubectl logs mathoid-mathoid-20180712215637-candidate-service-checker --namespae ci` for more info
Error: 1 test(s) failed

But I don't seem to have permissions to check logs:

$ kubectl --kubeconfig /etc/kubernetes/ci-staging.config logs mathoid-mathoid-20180712215637-candidate-service-checker --namespace ci
Error from server (Forbidden): pods "mathoid-mathoid-20180712215637-candidate-service-checker" is forbidden: User "jenkins" cannot get pods in the namspace "ci"

Event Timeline

I did some manual testing btw, I am guessing this is the error

servicechecker.CheckError: Generic connection error: HTTPConnectionPool(host='mathoid-mathoid-alex-test', port=10044): Max retries exceeded with url: /?spec (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff5024be490>: Failed to establish a new connection: [Errno -2] Name or service not known',))

And it fails because it tries to connect to

http://{{ template "wmf.releasename" . }}:{{ .Values.main_app.port }}

per

{{- define "wmf.appbaseurl" -}}
http://{{ template "wmf.releasename" . }}:{{ .Values.main_app.port }}
{{- end -}}

which is not populated in any way in our infrastructure (we don't have any DNS integration currently as the entire thing is in flux).

Ah ha! Thanks for the explanation. That makes sense since minikube uses kube-dns out of the box. Are we waiting for CoreDNS or something else?

I did some manual testing btw, I am guessing this is the error

servicechecker.CheckError: Generic connection error: HTTPConnectionPool(host='mathoid-mathoid-alex-test', port=10044): Max retries exceeded with url: /?spec (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff5024be490>: Failed to establish a new connection: [Errno -2] Name or service not known',))

How do we get these errors to surface correctly to the user when running helm test?

I did some manual testing btw, I am guessing this is the error

servicechecker.CheckError: Generic connection error: HTTPConnectionPool(host='mathoid-mathoid-alex-test', port=10044): Max retries exceeded with url: /?spec (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff5024be490>: Failed to establish a new connection: [Errno -2] Name or service not known',))

How do we get these errors to surface correctly to the user when running helm test?

It should be resolved in https://github.com/helm/helm/issues/1957 although it doesn't see much traffic

Change 450201 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Allow the deploy user to get pod logs

https://gerrit.wikimedia.org/r/450201

Change 450215 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Helm test: Use environment variables for service-checker

https://gerrit.wikimedia.org/r/450215

Change 450201 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Allow the deploy user to get pod logs

https://gerrit.wikimedia.org/r/450201

Change 450215 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Helm test: Use environment variables for service-checker

https://gerrit.wikimedia.org/r/450215

akosiaris claimed this task.

Aside from the RBAC rights fix, I 've also did a small change in the helm test and now

akosiaris@contint1001:~$ sudo HELM_HOME=/etc/helm KUBECONFIG=/etc/kubernetes/ci-staging.config helm --tiller-namespace=ci test mathoid-alex-test 
RUNNING: mathoid-mathoid-alex-test-service-checker
PASSED: mathoid-mathoid-alex-test-service-checker

I think this concludes this. I 'll resolve, feel free to reopen.