Page MenuHomePhabricator

Refresh SWAP notebook hardware
Closed, ResolvedPublic5 Estimate Story Points

Description

The SWAP jupyter notebook hardware is old and OOW, and we need to replace it.

Perhaps along the way we should update Jupyter too? And/or consider https://github.com/jupyterlab/jupyterlab ?

Details

Related Gerrit Patches:

Event Timeline

Ottomata created this task.Dec 18 2017, 2:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 18 2017, 2:25 PM
Ottomata added a subtask: Unknown Object (Task).Dec 18 2017, 2:25 PM
Ottomata renamed this task from Refresh SWAP hardware to Refresh SWAP notebook hardware.Feb 28 2018, 4:51 PM
Ottomata claimed this task.Mar 7 2018, 8:00 PM
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.
Ottomata set the point value for this task to 5.

Change 419251 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch

https://gerrit.wikimedia.org/r/419251

Change 419251 merged by Ottomata:
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch

https://gerrit.wikimedia.org/r/419251

Change 419260 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator

https://gerrit.wikimedia.org/r/419260

Change 419260 merged by Ottomata:
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator

https://gerrit.wikimedia.org/r/419260

Change 419507 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP

https://gerrit.wikimedia.org/r/419507

Change 419507 merged by Ottomata:
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP

https://gerrit.wikimedia.org/r/419507

Change 419509 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Add artifacts for initial build of swap

https://gerrit.wikimedia.org/r/419509

Change 419509 merged by Ottomata:
[analytics/swap/deploy@master] Add artifacts for initial build of swap

https://gerrit.wikimedia.org/r/419509

Change 419510 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Rename wheels_dir -> wheels

https://gerrit.wikimedia.org/r/419510

Change 419510 merged by Ottomata:
[analytics/swap/deploy@master] Rename wheels_dir -> wheels

https://gerrit.wikimedia.org/r/419510

Change 419656 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Puppetization for newer SWAP (JupyterHub) deployed via scap

https://gerrit.wikimedia.org/r/419656

Change 419821 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] create_virutalenv.sh now takes the destination venv path as $1

https://gerrit.wikimedia.org/r/419821

Change 419821 merged by Ottomata:
[analytics/swap/deploy@master] create_virutalenv.sh now takes the destination venv path as $1

https://gerrit.wikimedia.org/r/419821

Change 419656 merged by Ottomata:
[operations/puppet@production] Puppetization for newer SWAP (JupyterHub)

https://gerrit.wikimedia.org/r/419656

Change 419835 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs

https://gerrit.wikimedia.org/r/419835

Change 419835 merged by Ottomata:
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs

https://gerrit.wikimedia.org/r/419835

I have rsynced over user home directories from notebook1001 -> notebook1003, and am upgrading the default notebook venv ($HOME/venv) by:

wheels_path=/srv/jupyterhub/deploy/artifacts/stretch/wheels
for u in $(ls /home); do
    venv=/home/$u/venv
    if [ -d $venv ]; then
        echo "Upgrading $venv"
        sudo -u $u python3 -m venv --upgrade /home/$u/venv
        sudo -u $u $venv/bin/pip install --upgrade --no-index --find-links=$wheels_path jupyterhub jupyter jupyterlab
    fi
done

Will this work? ¯\_(ツ)_/¯

IT DID INDEED WORK! AWESOME!

Updated JupyterHub with JupyterLab beta installed on notebook1003 and notebook1004. notebook1003 home directories have been copied over.

WOoO let's try this out and test it. FYI: SPARK WORKS TOO!

Change 421298 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

https://gerrit.wikimedia.org/r/421298

Change 421298 merged by Ottomata:
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

https://gerrit.wikimedia.org/r/421298

Change 421306 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install python3 statistics packages; configure user venvs with packages in puppet

https://gerrit.wikimedia.org/r/421306

Change 421306 merged by Ottomata:
[operations/puppet@production] Install python3 packages; configure user venvs from requirements

https://gerrit.wikimedia.org/r/421306

Change 421320 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix typo in jupyterhub config

https://gerrit.wikimedia.org/r/421320

Change 421320 merged by Ottomata:
[operations/puppet@production] Fix typo in jupyterhub config

https://gerrit.wikimedia.org/r/421320

Change 421353 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow user venv to use system-site-packages

https://gerrit.wikimedia.org/r/421353

Change 421353 merged by Ottomata:
[operations/puppet@production] Allow user venv to use system-site-packages

https://gerrit.wikimedia.org/r/421353

Ah, ok, a better user venv upgrade is:

wheels_path=/srv/jupyterhub/deploy/artifacts/stretch/wheels
for u in $(getent passwd | awk -F ':' '{print $1}'); do
    venv=/home/$u/venv
    if [ -d $venv ]; then
        echo "Upgrading $venv"
        sudo -u $u python3 -m venv --upgrade /home/$u/venv
        # change system-site-packges to true
        test -f $venv/pyvenv.cfg && sed -i 's@include-system-site-packages = false@include-system-site-packages = true@' $venv/pyvenv.cfg
        sudo -u $u $venv/bin/pip install --upgrade --no-index --ignore-installed --find-links=$wheels_path --requirement=/srv/jupyterhub/deploy/frozen-requirements.txt
    fi
done

I've run this on all user venvs on notebook1003 and notebook1004.

Email sent (Subject: 'New SWAP (Jupyter Notebook) servers and updates!'). Timeline for notebook1001 deprecation: Monday April 2nd.

Small note for the record: I'm getting "Warning: JupyterHub seems to be served over an unsecured HTTP connection. We strongly recommend enabling HTTPS for JupyterHub" at the login screen. I guess that's rather inconsequential, considering that this goes through an SSH tunnel anyway, but I don't recall seeing the same message on notebook1001. Perhaps it's just a change in the new Jupyter version?

Yeah, if this wasn't happening before, it is almost certainly due to the JupyterHub version upgrade. Should be fine since it goes through ssh.

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).

from impala.dbapi import connect

hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-bb76209539e0> in <module>()
----> 1 hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')

~/venv/lib/python3.5/site-packages/impala/dbapi.py in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol)
    145                           ca_cert=ca_cert, user=user, password=password,
    146                           kerberos_service_name=kerberos_service_name,
--> 147                           auth_mechanism=auth_mechanism)
    148     return hs2.HiveServer2Connection(service, default_db=database)
    149 

~/venv/lib/python3.5/site-packages/impala/hiveserver2.py in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism)
    756     transport = get_transport(sock, host, kerberos_service_name,
    757                               auth_mechanism, user, password)
--> 758     transport.open()
    759     protocol = TBinaryProtocol(transport)
    760     if six.PY2:

~/venv/lib/python3.5/site-packages/thrift_sasl/__init__.py in open(self)
     65 
     66   def open(self):
---> 67     if not self._trans.isOpen():
     68       self._trans.open()
     69 

AttributeError: 'TSocket' object has no attribute 'isOpen'
elukey added a subscriber: elukey.Apr 4 2018, 11:17 AM

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).

AttributeError: 'TSocket' object has no attribute 'isOpen'

This might help: https://github.com/cloudera/impyla/issues/268

Hm, in the meantime, I’ve also installed pyhive, which I think has a
similar interface. https://github.com/dropbox/PyHive

Try that?

Hm, in the meantime, I’ve also installed pyhive, which I think has a
similar interface. https://github.com/dropbox/PyHive
Try that?

I am not sure the format is compatible with impyla (e.g. is the cursor.description part mandatory, i.e. would it need to be added every time when swapping out impyla for pyhive in an existing notebook?).

But in any case I can't get pyhive to work either right now. The example code from https://wikitech.wikimedia.org/wiki/SWAP#Hive fails as follows (in a fresh notebook on notebook1003):

In [1]: lang=pyhive
from pyhive import hive
cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')
cursor.description
[('page_title', 'STRING_TYPE', None, None, None, None, True)]
cursor.fetchall()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-3b4d63bb34fe> in <module>()
      1 from pyhive import hive
----> 2 cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
      3 cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')
      4 cursor.description
      5 [('page_title', 'STRING_TYPE', None, None, None, None, True)]

~/venv/lib/python3.5/site-packages/pyhive/hive.py in connect(*args, **kwargs)
     62     :returns: a :py:class:`Connection` object.
     63     """
---> 64     return Connection(*args, **kwargs)
     65 
     66 

~/venv/lib/python3.5/site-packages/pyhive/hive.py in __init__(self, host, port, username, database, auth, configuration, kerberos_service_name, password, thrift_transport)
    166                 username=username,
    167             )
--> 168             response = self._client.OpenSession(open_session_req)
    169             _check_status(response)
    170             assert response.sessionHandle is not None, "Expected a session from OpenSession"

~/venv/lib/python3.5/site-packages/TCLIService/TCLIService.py in OpenSession(self, req)
    185         """
    186         self.send_OpenSession(req)
--> 187         return self.recv_OpenSession()
    188 
    189     def send_OpenSession(self, req):

~/venv/lib/python3.5/site-packages/TCLIService/TCLIService.py in recv_OpenSession(self)
    197     def recv_OpenSession(self):
    198         iprot = self._iprot
--> 199         (fname, mtype, rseqid) = iprot.readMessageBegin()
    200         if mtype == TMessageType.EXCEPTION:
    201             x = TApplicationException()

~/venv/lib/python3.5/site-packages/thrift/protocol/TBinaryProtocol.py in readMessageBegin(self)
    132 
    133     def readMessageBegin(self):
--> 134         sz = self.readI32()
    135         if sz < 0:
    136             version = sz & TBinaryProtocol.VERSION_MASK

~/venv/lib/python3.5/site-packages/thrift/protocol/TBinaryProtocol.py in readI32(self)
    215 
    216     def readI32(self):
--> 217         buff = self.trans.readAll(4)
    218         val, = unpack('!i', buff)
    219         return val

AttributeError: 'TSaslClientTransport' object has no attribute 'readAll'

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).
AttributeError: 'TSocket' object has no attribute 'isOpen'

This might help: https://github.com/cloudera/impyla/issues/268

Like to someone else on that ticket, it wasn't quite clear to me which exact versions (and pip commands ) to use for that workaround. But after adapting the below from
https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Python-Error-TSaslClientTransport-object-has-no-attribute-trans/td-p/58033 (on a similar-sounding topic), impyla appears to work for me now, albeit in an outdated version:

# cf. https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Python-Error-TSaslClientTransport-object-has-no-attribute-trans/td-p/58033
!pip uninstall -y thrift
!pip uninstall -y impyla
!pip install thrift==0.9.3
!pip install impyla==0.13.8

Hmm, I'm having the same problem as @Tbayer, but that workaround isn't working for me.

> !pip show impyla
Name: impyla
Version: 0.13.8
> !pip show thrift
Name: thrift
Version: 0.9.3
> from impala.dbapi import connect
> impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-805f90863b9a> in <module>()
      1 from impala.dbapi import connect
----> 2 impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')

~/venv/lib/python3.5/site-packages/impala/dbapi.py in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol)
    145                           ca_cert=ca_cert, user=user, password=password,
    146                           kerberos_service_name=kerberos_service_name,
--> 147                           auth_mechanism=auth_mechanism)
    148     return hs2.HiveServer2Connection(service, default_db=database)
    149 

~/venv/lib/python3.5/site-packages/impala/hiveserver2.py in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism)
    656     transport = get_transport(sock, host, kerberos_service_name,
    657                               auth_mechanism, user, password)
--> 658     transport.open()
    659     protocol = TBinaryProtocol(transport)
    660     if six.PY2:

~/venv/lib/python3.5/site-packages/thrift_sasl/__init__.py in open(self)
     65 
     66   def open(self):
---> 67     if not self._trans.isOpen():
     68       self._trans.open()
     69 

AttributeError: 'TSocket' object has no attribute 'isOpen'

But in any case I can't get pyhive to work either right now

Hm, pyhive seems to work just fine for me:

from pyhive import hive
cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')
cursor.fetchall()
[('User:64.255.164.10',),
 ('Special:Log/!_!_!_!_!_!_!_!_!_!_!',),
 ('User:Akhil_0950',),
 ('User:82.52.37.150',),
 ('User:Daniel',),
 ('5_рашәара',),
 ('Ажьырныҳәа_5',),
 ('Алахәыла:ChuispastonBot',),
 ('Алахәыла_ахцәажәара:Oshwah',),
 ('Алахәыла_ахцәажәара:Untifler',)]

I had the same errors you guys saw with impyla. After parsing a few of those tickets you linked to, this invocation seems to work:

pip uninstall -y impyla thriftpy thrift_sasl sasl thrift
pip install thriftpy==0.3.9 thrift-sasl==0.2.1 sasl==0.2.1 six bit_array  impyla

This gets you the latest (0.14.1) impyla with versions of thrift-sasl, sasl, and thriftpy that should be compatible (not thrift, python3 wants thriftpy). I think impyla upstream needs to figure out their dependencies; pip install impyla should just work.

Change 425878 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes

https://gerrit.wikimedia.org/r/425878

Change 425878 merged by Ottomata:
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes

https://gerrit.wikimedia.org/r/425878

I had the same errors you guys saw with impyla. After parsing a few of those tickets you linked to, this invocation seems to work:

pip uninstall -y impyla thriftpy thrift_sasl sasl thrift
pip install thriftpy==0.3.9 thrift-sasl==0.2.1 sasl==0.2.1 six bit_array  impyla

This gets you the latest (0.14.1) impyla with versions of thrift-sasl, sasl, and thriftpy that should be compatible (not thrift, python3 wants thriftpy). I think impyla upstream needs to figure out their dependencies; pip install impyla should just work.

That seems to work; thank you!

Nuria closed this task as Resolved.Apr 17 2018, 3:02 AM

Change 427385 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove unused jupyterhub_old module

https://gerrit.wikimedia.org/r/427385

Change 427385 merged by Ottomata:
[operations/puppet@production] Remove unused jupyterhub_old module

https://gerrit.wikimedia.org/r/427385

RobH closed subtask Unknown Object (Task) as Resolved.May 31 2018, 4:29 PM

Change 451060 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

https://gerrit.wikimedia.org/r/451060

Change 451060 abandoned by Ottomata:
Update wheels with pyhive and impyla for default Hive access in prod

Reason:
wrong repo

https://gerrit.wikimedia.org/r/451060