Page MenuHomePhabricator

Add SWAP profile to stat1005
Closed, ResolvedPublic

Description

stat1005 is on buster so there is additional testing needed, plus packaging Jupyter for buster.

Event Timeline

elukey renamed this task from Add SWAP profile to 1005 to Add SWAP profile to stat1005 .Feb 13 2020, 5:38 PM
elukey updated the task description. (Show Details)
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 577761 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Add Debian Buster artifacts

https://gerrit.wikimedia.org/r/577761

Once T247055 is done, we should be able to add profile::swap to stat100[4,5] and see how it goes :)

Change 578271 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::statistics::explore: add profile::swap

https://gerrit.wikimedia.org/r/578271

Change 577761 abandoned by Elukey:
Add Debian Buster artifacts

Reason:
need to rework this a little to have stretch and buster to coexist.

https://gerrit.wikimedia.org/r/577761

Change 578290 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Upgrade dependencies for Debian Buster

https://gerrit.wikimedia.org/r/578290

Change 578290 merged by Elukey:
[analytics/jupyterhub/deploy@master] Upgrade dependencies for Debian Buster

https://gerrit.wikimedia.org/r/578290

Change 578271 merged by Elukey:
[operations/puppet@production] role::statistics::explore: add profile::swap

https://gerrit.wikimedia.org/r/578271

Change 578303 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] jupyterhub: refactor user authentication for posix groups

https://gerrit.wikimedia.org/r/578303

Change 578303 merged by Elukey:
[operations/puppet@production] jupyterhub: refactor user authentication for posix groups

https://gerrit.wikimedia.org/r/578303

Change 578311 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] jupyterhub: fix link to frozen-requirements.txt for user creation

https://gerrit.wikimedia.org/r/578311

Change 578311 merged by Elukey:
[operations/puppet@production] jupyterhub: fix link to frozen-requirements.txt for user creation

https://gerrit.wikimedia.org/r/578311

Change 578313 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] jupyterhub: fix reference to distro variable in jupyterhub_config.py

https://gerrit.wikimedia.org/r/578313

Change 578313 merged by Elukey:
[operations/puppet@production] jupyterhub: fix reference to distro variable in jupyterhub_config.py

https://gerrit.wikimedia.org/r/578313

Deployed on stat1004, everything seems working fine. On stat1005 (running buster) I can see:

Mar 09 11:56:23 stat1005 jupyterhub[38874]: [C 2020-03-09 11:56:23.782 JupyterHub app:2517] Failed to start proxy
Mar 09 11:56:23 stat1005 jupyterhub[38874]:     Traceback (most recent call last):
Mar 09 11:56:23 stat1005 jupyterhub[38874]:       File "/srv/jupyterhub/venv/lib/python3.7/site-packages/jupyterhub/app.py", line 2515, in start
Mar 09 11:56:23 stat1005 jupyterhub[38874]:         await self.proxy.start()
Mar 09 11:56:23 stat1005 jupyterhub[38874]:       File "/srv/jupyterhub/venv/lib/python3.7/site-packages/jupyterhub/proxy.py", line 663, in start
Mar 09 11:56:23 stat1005 jupyterhub[38874]:         self._write_pid_file()
Mar 09 11:56:23 stat1005 jupyterhub[38874]:       File "/srv/jupyterhub/venv/lib/python3.7/site-packages/jupyterhub/proxy.py", line 563, in _write_pid_file
Mar 09 11:56:23 stat1005 jupyterhub[38874]:         with open(self.pid_file, "w") as f:
Mar 09 11:56:23 stat1005 jupyterhub[38874]:     OSError: [Errno 30] Read-only file system: 'jupyterhub-proxy.pid'

Change 578325 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] jupyterhub: use http_proxy pid file for Buster

https://gerrit.wikimedia.org/r/578325

Change 578325 merged by Elukey:
[operations/puppet@production] jupyterhub: use http_proxy pid file for Buster

https://gerrit.wikimedia.org/r/578325

Change 579522 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Downgrade toree to 0.2.0 for Buster

https://gerrit.wikimedia.org/r/579522

Change 579522 merged by Elukey:
[analytics/jupyterhub/deploy@master] Downgrade toree to 0.2.0 for Buster

https://gerrit.wikimedia.org/r/579522

Marcel experienced a problem with the Spark Yarn kernel, namely the same thing reported in https://issues.apache.org/jira/browse/TOREE-485. It seems a problem with Toree 0.2.0 and JupyterLab >= 0.34. On stretch we run Jupyterlab 0.32 so we didn't observe the issue, but for buster we upgraded to 1.2.0 that unveiled the issue.

Change 579956 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Update kernel's README to match last changes in Toree kernels

https://gerrit.wikimedia.org/r/579956

Change 579956 merged by Elukey:
[analytics/jupyterhub/deploy@master] Make kernel README and definition consistent

https://gerrit.wikimedia.org/r/579956

Interesting: https://github.com/apache/incubator-toree/blob/master/RELEASE_NOTES.md

0.3.0

Removed support for PySpark and Spark R in Toree (use specific kernels)

Change 583935 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Use toree 0.3.0 on buster

https://gerrit.wikimedia.org/r/583935

Change 583935 merged by Elukey:
[analytics/jupyterhub/deploy@master] Use toree 0.3.0 on buster

https://gerrit.wikimedia.org/r/583935

Change 583971 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Add two kernels to Buster

https://gerrit.wikimedia.org/r/583971

Change 583971 abandoned by Elukey:
Add two kernels to Buster

https://gerrit.wikimedia.org/r/583971

Change 583972 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Add spark_yarn_pyspark_large to Buster's kernels

https://gerrit.wikimedia.org/r/583972

Change 583972 merged by Elukey:
[analytics/jupyterhub/deploy@master] Add spark_yarn_pyspark_large to Buster's kernels

https://gerrit.wikimedia.org/r/583972

Change 583981 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Rename Spark Buster kernels

https://gerrit.wikimedia.org/r/583981

Change 583981 merged by Elukey:
[analytics/jupyterhub/deploy@master] Rename Spark Buster kernels

https://gerrit.wikimedia.org/r/583981

Change 583983 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Fix display name of the spark_yarn_scala_large kernel

https://gerrit.wikimedia.org/r/583983

Change 583983 merged by Elukey:
[analytics/jupyterhub/deploy@master] Fix display name of the spark_yarn_scala_large kernel

https://gerrit.wikimedia.org/r/583983

Change 583991 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Fold the kernel's README into the main one and add documentation

https://gerrit.wikimedia.org/r/583991

Change 583991 merged by Elukey:
[analytics/jupyterhub/deploy@master] Fold the kernel's README into the main one and add documentation

https://gerrit.wikimedia.org/r/583991

Change 584018 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/jupyterhub/deploy@master] Fix kernel json paths after directory rename

https://gerrit.wikimedia.org/r/584018

Change 584018 merged by Elukey:
[analytics/jupyterhub/deploy@master] Fix kernel json paths after directory rename

https://gerrit.wikimedia.org/r/584018

Ok so next steps are:

  1. test more SWAP on stat1005/stat1008
  2. add a SparkR kernel on buster

I'm removing the Research tag. Please ping us if we can support in any way.

@mpopov hi! Are you using the SparkR kernels that we have on SWAP by any chance? I am asking because on stat100[5,8], our Buster nodes, we have upgraded to Apache Toree 0.3.0 and SparkR is not supported anymore. The suggestion from upstream is to use "specific kernels" (see https://github.com/apache/incubator-toree/blob/master/RELEASE_NOTES.md), that I guess it means (in our use case) to pip install a sparkr kernel and use it. So if nobody really needs a specific solution I'd just remove the ad-hoc kernel for SparkR currently available on Stretch nodes (stat1004/6, notebook100[3,4]) when migrating to Buster. What do you think?

@mpopov hi! Are you using the SparkR kernels that we have on SWAP by any chance? I am asking because on stat100[5,8], our Buster nodes, we have upgraded to Apache Toree 0.3.0 and SparkR is not supported anymore. The suggestion from upstream is to use "specific kernels" (see https://github.com/apache/incubator-toree/blob/master/RELEASE_NOTES.md), that I guess it means (in our use case) to pip install a sparkr kernel and use it. So if nobody really needs a specific solution I'd just remove the ad-hoc kernel for SparkR currently available on Stretch nodes (stat1004/6, notebook100[3,4]) when migrating to Buster. What do you think?

I occasionally use SparkR kernels when I wanted faster query execution but I wouldn't miss it if it disappeared. There's an official R package from the Presto team (https://cran.r-project.org/package=RPresto) so that might actually be the way to go instead of Spark.

@mpopov hi! Are you using the SparkR kernels that we have on SWAP by any chance? I am asking because on stat100[5,8], our Buster nodes, we have upgraded to Apache Toree 0.3.0 and SparkR is not supported anymore. The suggestion from upstream is to use "specific kernels" (see https://github.com/apache/incubator-toree/blob/master/RELEASE_NOTES.md), that I guess it means (in our use case) to pip install a sparkr kernel and use it. So if nobody really needs a specific solution I'd just remove the ad-hoc kernel for SparkR currently available on Stretch nodes (stat1004/6, notebook100[3,4]) when migrating to Buster. What do you think?

I occasionally use SparkR kernels when I wanted faster query execution but I wouldn't miss it if it disappeared. There's an official R package from the Presto team (https://cran.r-project.org/package=RPresto) so that might actually be the way to go instead of Spark.

Super, so I am declaring this task done then, thanks for the feedback!

elukey moved this task from In Progress to Done on the Analytics-Kanban board.