Page MenuHomePhabricator

Thumbor keeps losing Swift auth on beta
Closed, ResolvedPublic

Description

2016-11-14 10:56:17,673 8802 urllib3.connectionpool:DEBUG "GET /auth/v1.0 HTTP/1.1" 401 131
2016-11-14 10:56:17,681 8802 swiftclient:INFO REQ: curl -i http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/auth/v1.0 -X GET
2016-11-14 10:56:17,682 8802 swiftclient:INFO RESP STATUS: 401 Unauthorized
2016-11-14 10:56:17,682 8802 swiftclient:INFO RESP HEADERS: {u'date': u'Mon, 14 Nov 2016 10:56:17 GMT', u'content-length': u'131', u'content-type': u'text/html; charset=UTF-8', u'www-authenticate': u'Swift realm="unknown"', u'x-trans-id': u'tx7460f1dc76ba4dc084487-0058299851'}
2016-11-14 10:56:17,682 8802 swiftclient:INFO RESP BODY: <html><h1>Unauthorized</h1><p>This server could not verify that you are authorized to access the document you requested.</p></html>
2016-11-14 10:56:17,682 8802 swiftclient:ERROR Auth GET failed: http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/auth/v1.0 401 Unauthorized
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1553, in _retry
    self.url, self.token = self.get_auth()
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1507, in get_auth
    timeout=self.timeout)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 593, in get_auth
    timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 468, in get_auth_1_0
    http_status=resp.status, http_reason=resp.reason)
ClientException: Auth GET failed: http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/auth/v1.0 401 Unauthorized
2016-11-14 10:56:17,683 8802 thumbor:ERROR [Swift] put exception: ClientException('Auth GET failed',)

Event Timeline

To be clear, this isn't set up in production, right? It's Beta-Cluster-reproducible?

Only Beta keeps losing the Swift auth, the Thumbor production machines aren't affected by this issue.

does prod also show u'www-authenticate': u'Swift realm="unknown"' ?

hashar triaged this task as Medium priority.Nov 16 2016, 9:05 AM
hashar moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.

@Krenair where are you seeing that btw?

The issue afaics is that swift on deployment-ms-fe01 doesn't have the password for mw:thumbor in /etc/swift/proxy-server.conf but the configuration seems the same in Hiera:Deployment-prep

The issue afaics is that swift on deployment-ms-fe01 doesn't have the password for mw:thumbor in /etc/swift/proxy-server.conf but the configuration seems the same in Hiera:Deployment-prep

yeah the key in the swift::params::accounts hieradata was wrong: https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=984990&oldid=949676

--- /etc/swift/proxy-server.conf	2016-10-13 16:02:15.611999042 +0000
+++ /tmp/puppet-file20161117-12279-tdjaor	2016-11-17 01:42:18.392173554 +0000
@@ -21,7 +21,7 @@
 use = egg:swift#tempauth
 token_life = 604800
 user_mw_media = abracadabra .admin http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/v1/AUTH_mw
-user_mw_thumbor =   http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/v1/AUTH_mw
+user_mw_thumbor = OhdeaYuu4reoziichug6  http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/v1/AUTH_mw
 
 [filter:container_sync]
 use = egg:swift#container_sync

does that solve your problem @Gilles?

Nope. Thumbor's config has those values:

SWIFT_HOST = 'http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs'
SWIFT_API_PATH = '/v1/AUTH_mw/'
SWIFT_AUTH_PATH = '/auth/v1.0'
SWIFT_USER = 'mw:thumbor'
SWIFT_KEY = 'OhdeaYuu4reoziichug6'

But it still 401s:

2016-11-17 13:27:44,742 8802 urllib3.connectionpool:INFO Starting new HTTP connection (1): deployment-ms-fe01.deployment-prep.eqiad.wmflabs
2016-11-17 13:27:44,752 8802 urllib3.connectionpool:DEBUG "GET /auth/v1.0 HTTP/1.1" 401 131
2016-11-17 13:27:44,752 8802 swiftclient:INFO REQ: curl -i http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/auth/v1.0 -X GET
2016-11-17 13:27:44,753 8802 swiftclient:INFO RESP STATUS: 401 Unauthorized
2016-11-17 13:27:44,754 8802 swiftclient:INFO RESP HEADERS: {u'date': u'Thu, 17 Nov 2016 13:27:44 GMT', u'content-length': u'131', u'content-type': u'text/html; charset=UTF-8', u'www-authenticate': u'Swift realm="unknown"', u'x-trans-id': u'tx7a53f36767464d19a62a9-00582db050'}
2016-11-17 13:27:44,754 8802 swiftclient:INFO RESP BODY: <html><h1>Unauthorized</h1><p>This server could not verify that you are authorized to access the document you requested.</p></html>
2016-11-17 13:27:44,754 8802 swiftclient:ERROR Auth GET failed: http://deployment-ms-fe01.deployment-prep.eqiad.wmflabs/auth/v1.0 401 Unauthorized

does prod also show u'www-authenticate': u'Swift realm="unknown"' ?

I'm not sure what you're asking, Swift auth works in production and we don't log at debug level there. Not sure that the swiftclient library would log the response headers in the case of a successful auth anyway.

I'll write some python code to mimic what Thumbor does and attempt to get the headers of the successful response in production.

DEBUG:swiftclient:REQ: curl -i http://ms-fe.svc.eqiad.wmnet/auth/v1.0 -X GET
DEBUG:swiftclient:RESP STATUS: 200 OK
DEBUG:swiftclient:RESP HEADERS: {u'content-length': u'0', u'x-storage-token': u'AUTH_tk...', u'x-auth-token': u'AUTH_tk...', u'x-trans-id': u'tx...', u'date': u'Thu, 17 Nov 2016 13:50:13 GMT', u'x-storage-url': u'http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw', u'content-type': u'text/html; charset=UTF-8'}

@Krenair I don't think the realm/headers is related

The swift proxy needs restarting once credentials are in place, I've done that and thumbor seems happy on deployment-imagescaler01 ! Tentatively closing.

thanks. puppet doesn't handle that automatically?

No it doesn't because in production that'd mean uncoordinated restarts of swift-proxy. It would probably be fine I think to restart since swift-proxy restarts are seamless from the user POV, or special case to restart in labs.

No it doesn't because in production that'd mean uncoordinated restarts of swift-proxy. It would probably be fine I think to restart since swift-proxy restarts are seamless from the user POV, or special case to restart in labs.

It means the same thing in beta, we just care less about it in beta. I'm not wild about a special case to restart just in labs, but shouldn't we have a list of services deliberately missing restarts/reloads on config changes? This certainly wouldn't be the only one.