The SWAP jupyter notebook hardware is old and OOW, and we need to replace it.
Perhaps along the way we should update Jupyter too? And/or consider https://github.com/jupyterlab/jupyterlab ?
The SWAP jupyter notebook hardware is old and OOW, and we need to replace it.
Perhaps along the way we should update Jupyter too? And/or consider https://github.com/jupyterlab/jupyterlab ?
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T183145 Refresh SWAP notebook hardware | |||
Unknown Object (Task) | |||||
Resolved | elukey | T183935 rack/setup/install notebook100[34] | |||
Resolved | Cmjohnson | T192103 Decommission notebook1001 |
Change 419251 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch
Change 419251 merged by Ottomata:
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch
Change 419260 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator
Change 419260 merged by Ottomata:
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator
Change 419507 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP
Change 419507 merged by Ottomata:
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP
Change 419509 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Add artifacts for initial build of swap
Change 419509 merged by Ottomata:
[analytics/swap/deploy@master] Add artifacts for initial build of swap
Change 419510 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Rename wheels_dir -> wheels
Change 419510 merged by Ottomata:
[analytics/swap/deploy@master] Rename wheels_dir -> wheels
Change 419656 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Puppetization for newer SWAP (JupyterHub) deployed via scap
Change 419821 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] create_virutalenv.sh now takes the destination venv path as $1
Change 419821 merged by Ottomata:
[analytics/swap/deploy@master] create_virutalenv.sh now takes the destination venv path as $1
Change 419656 merged by Ottomata:
[operations/puppet@production] Puppetization for newer SWAP (JupyterHub)
Change 419835 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs
Change 419835 merged by Ottomata:
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs
I have rsynced over user home directories from notebook1001 -> notebook1003, and am upgrading the default notebook venv ($HOME/venv) by:
wheels_path=/srv/jupyterhub/deploy/artifacts/stretch/wheels for u in $(ls /home); do venv=/home/$u/venv if [ -d $venv ]; then echo "Upgrading $venv" sudo -u $u python3 -m venv --upgrade /home/$u/venv sudo -u $u $venv/bin/pip install --upgrade --no-index --find-links=$wheels_path jupyterhub jupyter jupyterlab fi done
Will this work? ¯\_(ツ)_/¯
IT DID INDEED WORK! AWESOME!
Updated JupyterHub with JupyterLab beta installed on notebook1003 and notebook1004. notebook1003 home directories have been copied over.
WOoO let's try this out and test it. FYI: SPARK WORKS TOO!
Change 421298 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod
Change 421298 merged by Ottomata:
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod
Change 421306 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install python3 statistics packages; configure user venvs with packages in puppet
Change 421306 merged by Ottomata:
[operations/puppet@production] Install python3 packages; configure user venvs from requirements
Change 421320 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix typo in jupyterhub config
Change 421320 merged by Ottomata:
[operations/puppet@production] Fix typo in jupyterhub config
Change 421353 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow user venv to use system-site-packages
Change 421353 merged by Ottomata:
[operations/puppet@production] Allow user venv to use system-site-packages
Ah, ok, a better user venv upgrade is:
wheels_path=/srv/jupyterhub/deploy/artifacts/stretch/wheels for u in $(getent passwd | awk -F ':' '{print $1}'); do venv=/home/$u/venv if [ -d $venv ]; then echo "Upgrading $venv" sudo -u $u python3 -m venv --upgrade /home/$u/venv # change system-site-packges to true test -f $venv/pyvenv.cfg && sed -i 's@include-system-site-packages = false@include-system-site-packages = true@' $venv/pyvenv.cfg sudo -u $u $venv/bin/pip install --upgrade --no-index --ignore-installed --find-links=$wheels_path --requirement=/srv/jupyterhub/deploy/frozen-requirements.txt fi done
I've run this on all user venvs on notebook1003 and notebook1004.
Email sent (Subject: 'New SWAP (Jupyter Notebook) servers and updates!'). Timeline for notebook1001 deprecation: Monday April 2nd.
Small note for the record: I'm getting "Warning: JupyterHub seems to be served over an unsecured HTTP connection. We strongly recommend enabling HTTPS for JupyterHub" at the login screen. I guess that's rather inconsequential, considering that this goes through an SSH tunnel anyway, but I don't recall seeing the same message on notebook1001. Perhaps it's just a change in the new Jupyter version?
Yeah, if this wasn't happening before, it is almost certainly due to the JupyterHub version upgrade. Should be fine since it goes through ssh.
I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).
from impala.dbapi import connect hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-3-bb76209539e0> in <module>() ----> 1 hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN') ~/venv/lib/python3.5/site-packages/impala/dbapi.py in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol) 145 ca_cert=ca_cert, user=user, password=password, 146 kerberos_service_name=kerberos_service_name, --> 147 auth_mechanism=auth_mechanism) 148 return hs2.HiveServer2Connection(service, default_db=database) 149 ~/venv/lib/python3.5/site-packages/impala/hiveserver2.py in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism) 756 transport = get_transport(sock, host, kerberos_service_name, 757 auth_mechanism, user, password) --> 758 transport.open() 759 protocol = TBinaryProtocol(transport) 760 if six.PY2: ~/venv/lib/python3.5/site-packages/thrift_sasl/__init__.py in open(self) 65 66 def open(self): ---> 67 if not self._trans.isOpen(): 68 self._trans.open() 69 AttributeError: 'TSocket' object has no attribute 'isOpen'
AttributeError: 'TSocket' object has no attribute 'isOpen'
This might help: https://github.com/cloudera/impyla/issues/268
Hm, in the meantime, I’ve also installed pyhive, which I think has a
similar interface. https://github.com/dropbox/PyHive
Try that?
I am not sure the format is compatible with impyla (e.g. is the cursor.description part mandatory, i.e. would it need to be added every time when swapping out impyla for pyhive in an existing notebook?).
But in any case I can't get pyhive to work either right now. The example code from https://wikitech.wikimedia.org/wiki/SWAP#Hive fails as follows (in a fresh notebook on notebook1003):
In [1]: lang=pyhive from pyhive import hive cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor() cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10') cursor.description [('page_title', 'STRING_TYPE', None, None, None, None, True)] cursor.fetchall() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-1-3b4d63bb34fe> in <module>() 1 from pyhive import hive ----> 2 cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor() 3 cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10') 4 cursor.description 5 [('page_title', 'STRING_TYPE', None, None, None, None, True)] ~/venv/lib/python3.5/site-packages/pyhive/hive.py in connect(*args, **kwargs) 62 :returns: a :py:class:`Connection` object. 63 """ ---> 64 return Connection(*args, **kwargs) 65 66 ~/venv/lib/python3.5/site-packages/pyhive/hive.py in __init__(self, host, port, username, database, auth, configuration, kerberos_service_name, password, thrift_transport) 166 username=username, 167 ) --> 168 response = self._client.OpenSession(open_session_req) 169 _check_status(response) 170 assert response.sessionHandle is not None, "Expected a session from OpenSession" ~/venv/lib/python3.5/site-packages/TCLIService/TCLIService.py in OpenSession(self, req) 185 """ 186 self.send_OpenSession(req) --> 187 return self.recv_OpenSession() 188 189 def send_OpenSession(self, req): ~/venv/lib/python3.5/site-packages/TCLIService/TCLIService.py in recv_OpenSession(self) 197 def recv_OpenSession(self): 198 iprot = self._iprot --> 199 (fname, mtype, rseqid) = iprot.readMessageBegin() 200 if mtype == TMessageType.EXCEPTION: 201 x = TApplicationException() ~/venv/lib/python3.5/site-packages/thrift/protocol/TBinaryProtocol.py in readMessageBegin(self) 132 133 def readMessageBegin(self): --> 134 sz = self.readI32() 135 if sz < 0: 136 version = sz & TBinaryProtocol.VERSION_MASK ~/venv/lib/python3.5/site-packages/thrift/protocol/TBinaryProtocol.py in readI32(self) 215 216 def readI32(self): --> 217 buff = self.trans.readAll(4) 218 val, = unpack('!i', buff) 219 return val AttributeError: 'TSaslClientTransport' object has no attribute 'readAll'
Like to someone else on that ticket, it wasn't quite clear to me which exact versions (and pip commands ) to use for that workaround. But after adapting the below from
https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Python-Error-TSaslClientTransport-object-has-no-attribute-trans/td-p/58033 (on a similar-sounding topic), impyla appears to work for me now, albeit in an outdated version:
# cf. https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Python-Error-TSaslClientTransport-object-has-no-attribute-trans/td-p/58033 !pip uninstall -y thrift !pip uninstall -y impyla !pip install thrift==0.9.3 !pip install impyla==0.13.8
Hmm, I'm having the same problem as @Tbayer, but that workaround isn't working for me.
> !pip show impyla Name: impyla Version: 0.13.8 > !pip show thrift Name: thrift Version: 0.9.3 > from impala.dbapi import connect > impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-16-805f90863b9a> in <module>() 1 from impala.dbapi import connect ----> 2 impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN') ~/venv/lib/python3.5/site-packages/impala/dbapi.py in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol) 145 ca_cert=ca_cert, user=user, password=password, 146 kerberos_service_name=kerberos_service_name, --> 147 auth_mechanism=auth_mechanism) 148 return hs2.HiveServer2Connection(service, default_db=database) 149 ~/venv/lib/python3.5/site-packages/impala/hiveserver2.py in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism) 656 transport = get_transport(sock, host, kerberos_service_name, 657 auth_mechanism, user, password) --> 658 transport.open() 659 protocol = TBinaryProtocol(transport) 660 if six.PY2: ~/venv/lib/python3.5/site-packages/thrift_sasl/__init__.py in open(self) 65 66 def open(self): ---> 67 if not self._trans.isOpen(): 68 self._trans.open() 69 AttributeError: 'TSocket' object has no attribute 'isOpen'
But in any case I can't get pyhive to work either right now
Hm, pyhive seems to work just fine for me:
from pyhive import hive cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor() cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10') cursor.fetchall() [('User:64.255.164.10',), ('Special:Log/!_!_!_!_!_!_!_!_!_!_!',), ('User:Akhil_0950',), ('User:82.52.37.150',), ('User:Daniel',), ('5_рашәара',), ('Ажьырныҳәа_5',), ('Алахәыла:ChuispastonBot',), ('Алахәыла_ахцәажәара:Oshwah',), ('Алахәыла_ахцәажәара:Untifler',)]
I had the same errors you guys saw with impyla. After parsing a few of those tickets you linked to, this invocation seems to work:
pip uninstall -y impyla thriftpy thrift_sasl sasl thrift pip install thriftpy==0.3.9 thrift-sasl==0.2.1 sasl==0.2.1 six bit_array impyla
This gets you the latest (0.14.1) impyla with versions of thrift-sasl, sasl, and thriftpy that should be compatible (not thrift, python3 wants thriftpy). I think impyla upstream needs to figure out their dependencies; pip install impyla should just work.
Change 425878 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes
Change 425878 merged by Ottomata:
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes
Change 427385 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove unused jupyterhub_old module
Change 427385 merged by Ottomata:
[operations/puppet@production] Remove unused jupyterhub_old module
Change 451060 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod
Change 451060 abandoned by Ottomata:
Update wheels with pyhive and impyla for default Hive access in prod
Reason:
wrong repo