mholloway-shell@deploy1001:/srv/deployment/recommendation-api/deploy$ scap deploy "`git log --pretty=format:'%s' -n 1`" 14:43:17 Started deploy [recommendation-api/deploy@c39d567] 14:43:17 Deploying Rev: HEAD = c39d56753f4e695fd0d6c8ed1785f78385cfbce7 14:43:17 Started deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to db97742 14:43:17 == CANARY == :* scb2001.codfw.wmnet recommendation-api/deploy: fetch stage(s): 100% (ok: 1; fail: 0; left: 0) recommendation-api/deploy: config_deploy stage(s): 100% (ok: 1; fail: 0; left: 0) 14:43:46 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'recommendation-api/deploy', '-g', 'canary', 'promote', '--refresh-config'] on scb2001.codfw.wmnet returned [1]: Linking config files at: /srv/deployment/recommendation-api/deploy-cache/revs/c39d56753f4e695fd0d6c8ed1785f78385cfbce7/.git/config-files Executing check 'depool' Check 'depool' completed, output: 2020-06-17 14:43:29,368 [INFO] Depooling currently pooled services 2020-06-17 14:43:29,470 [WARNING] LB lvs2009:9090 reports pool recommendation-api_9632 as enabled/up/pooled, should be disabled/*/not pooled Restarting service 'recommendation_api' Port 9632 not up. Waiting 3.00s Port 9632 up in 3.00s Executing check 'endpoints' Check 'endpoints' failed: /{domain}/v1/description/addition/{target} (Caption addition suggestions beta cluster) is CRITICAL: Test Caption addition suggestions beta cluster returned the unexpected status 504 (expecting: 200) Executing check 'repool' Check 'repool' completed, output: 2020-06-17 14:43:40,951 [INFO] Pooling currently depooled services 2020-06-17 14:43:41,043 [WARNING] LB lvs2009:9090 reports pool recommendation-api_9632 as disabled/up/not pooled, should be enabled/up/pooled recommendation-api/deploy: promote and restart_service stage(s): 100% (ok: 0; fail: 1; left: 0) 14:43:46 1 targets had deploy errors 14:43:46 1 targets failed 14:43:46 1 of 1 canary targets failed, exceeding limit Rollback all deployed groups? [Y/n]: Y 14:44:25 == CANARY == :* scb2001.codfw.wmnet recommendation-api/deploy: rollback stage(s): 100% (ok: 1; fail: 0; left: 0) 14:44:33 Finished deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to db97742 (duration: 01m 16s) 14:44:33 Finished deploy [recommendation-api/deploy@c39d567] (duration: 01m 16s)
Description
Details
Related Objects
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2020-06-17T14:49:18Z] <mdholloway> rolled back recommendation-api deployment due to canary endpoint check failure (T255683)
Underlying issue appears to be a TLS error:
mholloway-shell@mwmaint1002:~$ curl -s -X GET --header 'Accept: application/json; charset=utf-8' 'http://recommendation-api.discovery.wmnet:9632/wikidata.beta.wmflabs.org/v1/description/addition/en' | jq . { "status": 504, "type": "internal_http_error", "detail": "Hostname/IP doesn't match certificate's altnames: \"Host: en.wikipedia.beta.wmflabs.org. is not in the cert's altnames: DNS:*.wikipedia.org, DNS:*.m.mediawiki.org, DNS:*.m.wikibooks.org, DNS:*.m.wikidata.org, DNS:*.m.wikimedia.org, DNS:*.m.wikimediafoundation.org, DNS:*.m.wikinews.org, DNS:*.m.wikipedia.org, DNS:*.m.wikiquote.org, DNS:*.m.wikisource.org, DNS:*.m.wikiversity.org, DNS:*.m.wikivoyage.org, DNS:*.m.wiktionary.org, DNS:*.mediawiki.org, DNS:*.planet.wikimedia.org, DNS:*.wikibooks.org, DNS:*.wikidata.org, DNS:*.wikimedia.org, DNS:*.wikimediafoundation.org, DNS:*.wikinews.org, DNS:*.wikiquote.org, DNS:*.wikisource.org, DNS:*.wikiversity.org, DNS:*.wikivoyage.org, DNS:*.wiktionary.org, DNS:*.wmfusercontent.org, DNS:*.zero.wikipedia.org, DNS:mediawiki.org, DNS:w.wiki, DNS:wikibooks.org, DNS:wikidata.org, DNS:wikimedia.org, DNS:wikimediafoundation.org, DNS:wikinews.org, DNS:wikiquote.org, DNS:wikisource.org, DNS:wikiversity.org, DNS:wikivoyage.org, DNS:wiktionary.org, DNS:wmfusercontent.org, DNS:wikipedia.org, DNS:api-ro.discovery.wmnet, DNS:api-rw.discovery.wmnet, DNS:api.svc.eqiad.wmnet\"", "method": "GET", "uri": "/wikidata.beta.wmflabs.org/v1/description/addition/en" }
I think the easiest fix here is just to take out the new endpoint check making the Beta Cluster request.
Change 606210 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/recommendation-api@master] Remove new Beta Cluster endpoint check
I see the recommendation-api is in deployment-charts. Are you sure we should still deploy this using scap?
It's not yet being deployed on the pipeline; https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/602527/ is still in review.
Change 606481 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/recommendation-api@master] SE endpoints: Refactor for easier testing and add tests
Change 606210 merged by jenkins-bot:
[mediawiki/services/recommendation-api@master] Remove new Beta Cluster endpoint check
Change 606481 merged by jenkins-bot:
[mediawiki/services/recommendation-api@master] SE endpoints: Refactor for easier testing and add tests