Page MenuHomePhabricator

echostore helm test service checker failing in staging cluster
Closed, ResolvedPublic

Description

Running helm test on echostore in the staging cluster fails.

UPDATED RELEASES:
NAME      CHART         VERSION
staging   stable/kask    0.0.11

jhuneidi@deploy1001:/srv/deployment-charts/helmfile.d/services/staging/echostore$ kubectl get pods
NAME                             READY   STATUS    RESTARTS   AGE
kask-staging-778ddf46b-gt72p     1/1     Running   0          68s
tiller-deploy-6754cdc78b-945nj   1/1     Running   0          57d
jhuneidi@deploy1001:/srv/deployment-charts/helmfile.d/services/staging/echostore$ helm test staging
RUNNING: kask-staging-service-checker
FAILED: kask-staging-service-checker, run `kubectl logs kask-staging-service-checker --namespace echostore` for more info
Error: 1 test(s) failed
jhuneidi@deploy1001:/srv/deployment-charts/helmfile.d/services/staging/echostore$ kubectl logs kask-staging-service-checker --namespace echostore
Traceback (most recent call last):
  File "/usr/bin/service-checker-swagger", line 11, in <module>
    load_entry_point('servicechecker==0.1.4', 'console_scripts', 'service-checker-swagger')()
  File "/usr/lib/python2.7/dist-packages/servicechecker/swagger.py", line 515, in main
    checker.run()
  File "/usr/lib/python2.7/dist-packages/servicechecker/swagger.py", line 149, in run
    for ep, data in self.get_endpoints()]
  File "/usr/lib/python2.7/dist-packages/servicechecker/swagger.py", line 103, in get_endpoints
    raise ValueError("No valid spec found")
ValueError: No valid spec found

This does not happen while testing locally using the v.1.0.5 kask image specified in the echostore chart.

Event Timeline

That not very helpful message was due to pretty old image of service-checker in the image. I 've updated it and now the message is a bit different

WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', BadStatusLine('\x15\x03\x01\x00\x02\x02\n',))': /openapi
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', BadStatusLine('\x15\x03\x01\x00\x02\x02\n',))': /openapi
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', BadStatusLine('\x15\x03\x01\x00\x02\x02\n',))': /openapi
Traceback (most recent call last):

servicechecker.CheckError: Generic connection error: HTTPConnectionPool(host='10.64.76.121', port=8082): Max retries exceeded with url: /openapi (Caused by ProtocolError('Connection aborted.', BadStatusLine('\x15\x03\x01\x00\x02\x02\n',)))

That's because it tries to connect to an HTTPS port talking HTTP.

Switching to HTTPS it does say indeed something more useful

ssl.CertificateError: hostname 'kask-staging' doesn't match either of 'echostore.discovery.wmnet', 'echostore.svc.codfw.wmnet', 'echostore.svc.eqiad.wmnet'

This will require somewhat more involved changes. Possibly instructing/altering service-checker to ignore mismatched certificates.

Thanks for taking a look. I should have confirmed the service-checker version!

Change 641790 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/software/service-checker@master] Allow skipping cert verification

https://gerrit.wikimedia.org/r/641790

Change 641790 merged by jenkins-bot:
[operations/software/service-checker@master] Allow skipping cert verification

https://gerrit.wikimedia.org/r/641790

@jeena. With https://gerrit.wikimedia.org/r/641790 reviewed and merged I just release 0.2.1 and the relevant image was built. That allows to amend the chart and add the --insecure flag a to service-checker and talk over HTTPS so that we can ignore mismatched certs in CI.

This open task is only tagged with the RelEng Q2 project tag which recently has been archived. Please add an active project tag to this task so this task is discoverable. Thanks!

thcipriani assigned this task to akosiaris.
thcipriani subscribed.

No reply; removing archived project tag and adding Release-Engineering-Team-TODO

Thanks! Been resolved, just not closed.