Page MenuHomePhabricator

Enable shell access to presto from jupyter/stats machines
Closed, ResolvedPublic13 Estimated Story Points

Description

Enable shell access to presto from jupyter/stats machines

Event Timeline

This is currently blocked by enabling kerberos for Presto (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570248/), hopefully we should be able to unblock the task this week or the next :)

Let's please have a simple wikitech page that explains how to access presto as is now.

Change 570899 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] presto: refactor TLS passwords parameter to be more sharable

https://gerrit.wikimedia.org/r/570899

Change 570899 merged by Elukey:
[operations/puppet@production] presto: refactor TLS passwords parameter to be more sharable

https://gerrit.wikimedia.org/r/570899

Change 570909 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add presto client to stat and notebook hosts

https://gerrit.wikimedia.org/r/570909

Change 570909 merged by Elukey:
[operations/puppet@production] Add presto client to stat and notebook hosts

https://gerrit.wikimedia.org/r/570909

All the stat/notebooks have now the presto cli working with Kerberos. I tested the following python script and it seems working:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import prestodb
import os


server = {
    'host': 'an-coord1001.eqiad.wmnet',
    'port': 8281,
    'ca_bundle':  './file.pem',
    }

conn = prestodb.dbapi.connect(
    http_scheme='https',
    host=server['host'],
    port=server['port'],
    user=os.environ['USER'],
    catalog='analytics_hive',
    auth=prestodb.auth.KerberosAuthentication(
        config='/etc/krb5.conf',
        service_name='presto',
        principal='{}@WIKIMEDIA'.format(os.environ['USER']),
        ca_bundle=server['ca_bundle']
        )
)
cursor = conn.cursor()
cursor.execute('SHOW TABLES from event')
for row in cursor.fetchall():
    print(row)

(credits to https://github.com/prestodb/presto-python-client/issues/48#issuecomment-515784886)

Some notes:

  • the ca_bundle parameter is currently a one-off that I have created manually, need to deploy it via puppet.
  • I used python-presto-client['kerberos'] in a python3 venv, and it required libkrb5-dev to deploy properly. Need to deploy libkrb5-dev.
  • the python code complains that the certificates are missing the subject alternate name, failling back to the commonName. Will need to roll out new certs to avoid this annoyance.

Change 571229 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::client: add libkrb5-dev to the list of packages

https://gerrit.wikimedia.org/r/571229

Change 571229 abandoned by Elukey:
profile::kerberos::client: add libkrb5-dev to the list of packages

https://gerrit.wikimedia.org/r/571229

Change 571230 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::cluster::packages::common: add libkrb5-dev

https://gerrit.wikimedia.org/r/571230

Change 571230 merged by Elukey:
[operations/puppet@production] profile::analytics::cluster::packages::common: add libkrb5-dev

https://gerrit.wikimedia.org/r/571230

Change 571244 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::presto::client: deploy ca public crt to client nodes

https://gerrit.wikimedia.org/r/571244

Change 571244 merged by Elukey:
[operations/puppet@production] profile::presto::client: deploy ca public crt to client nodes

https://gerrit.wikimedia.org/r/571244

elukey added a project: Analytics-Kanban.
elukey set the point value for this task to 8.
elukey changed the point value for this task from 8 to 13.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.
elukey added a subscriber: Aklapper.
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

is prestodb installed via debian or something it needs to be installed via pip in a virtual env? Cause on stat1007 prestodb does not seem available

CLI works great and it is SO very fast!