Enable shell access to presto from jupyter/stats machines
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | mforns | T243309 Add Presto to Analytics' stack | |||
Resolved | elukey | T243312 Enable shell access to presto from jupyter/stats machines | |||
Duplicate | None | T244505 Presto access on jupyter notebooks |
Event Timeline
This is currently blocked by enabling kerberos for Presto (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570248/), hopefully we should be able to unblock the task this week or the next :)
Let's please have a simple wikitech page that explains how to access presto as is now.
Change 570899 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] presto: refactor TLS passwords parameter to be more sharable
Change 570899 merged by Elukey:
[operations/puppet@production] presto: refactor TLS passwords parameter to be more sharable
Change 570909 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add presto client to stat and notebook hosts
Change 570909 merged by Elukey:
[operations/puppet@production] Add presto client to stat and notebook hosts
All the stat/notebooks have now the presto cli working with Kerberos. I tested the following python script and it seems working:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- import prestodb import os server = { 'host': 'an-coord1001.eqiad.wmnet', 'port': 8281, 'ca_bundle': './file.pem', } conn = prestodb.dbapi.connect( http_scheme='https', host=server['host'], port=server['port'], user=os.environ['USER'], catalog='analytics_hive', auth=prestodb.auth.KerberosAuthentication( config='/etc/krb5.conf', service_name='presto', principal='{}@WIKIMEDIA'.format(os.environ['USER']), ca_bundle=server['ca_bundle'] ) ) cursor = conn.cursor() cursor.execute('SHOW TABLES from event') for row in cursor.fetchall(): print(row)
(credits to https://github.com/prestodb/presto-python-client/issues/48#issuecomment-515784886)
Some notes:
- the ca_bundle parameter is currently a one-off that I have created manually, need to deploy it via puppet.
- I used python-presto-client['kerberos'] in a python3 venv, and it required libkrb5-dev to deploy properly. Need to deploy libkrb5-dev.
- the python code complains that the certificates are missing the subject alternate name, failling back to the commonName. Will need to roll out new certs to avoid this annoyance.
Change 571229 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::client: add libkrb5-dev to the list of packages
Change 571229 abandoned by Elukey:
profile::kerberos::client: add libkrb5-dev to the list of packages
Change 571230 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::cluster::packages::common: add libkrb5-dev
Change 571230 merged by Elukey:
[operations/puppet@production] profile::analytics::cluster::packages::common: add libkrb5-dev
Change 571244 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::presto::client: deploy ca public crt to client nodes
Change 571244 merged by Elukey:
[operations/puppet@production] profile::presto::client: deploy ca public crt to client nodes
Updated the documentation: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto#Usage_on_analytics_cluster
is prestodb installed via debian or something it needs to be installed via pip in a virtual env? Cause on stat1007 prestodb does not seem available