Page MenuHomePhabricator

Add support for jupyterhub on conda-analytics
Closed, ResolvedPublic1 Estimated Story Points

Description

We have been using the conda-analytics Miniconda-based environment to run Spark3 Airflow jobs and for adhoc spark commands for a while now.

We now want to extend Spark3 support to include the users of JupyterHub, and by extension to the users of wmfdata.

Thus in this task we want to:

  • Modify conda-analytics to support jupyterhub.
  • Do the necessary puppet changes so that stat machines start pointing to conda-analytics instead of anaconda-wmf.

This work is a prerequisite to T318587.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 843959 had a related patch set uploaded (by Xcollazo; author: Xcollazo):

[operations/puppet@production] Modify jupyterhub files to point to conda-analytics instead of anaconda-wmf.

https://gerrit.wikimedia.org/r/843959

mpopov subscribed.

Will this break https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/wmf-product/jobs/+/refs/heads/master/movement_metrics/main.sh where we're activating anaconda-wmf env?

Is it possible to stage this change and test that it doesn't break anything?

Will this break https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/wmf-product/jobs/+/refs/heads/master/movement_metrics/main.sh where we're activating anaconda-wmf env?

In this task we are deprecating anaconda-wmf, but we are not removing it. Your script will continue to work as is.

Is it possible to stage this change and test that it doesn't break anything?

We will be testing this on the analytics test cluster first, yes. Happy to let you know when it is available in the test cluster for if you want to test stuff as well?

In this task we are deprecating anaconda-wmf, but we are not removing it. Your script will continue to work as is.

Fantastic, thank you!

We will be testing this on the analytics test cluster first, yes. Happy to let you know when it is available in the test cluster for if you want to test stuff as well?

Oh that is cool! Hm… I think we're OK for now, thanks! I'll also check in with my team and let you know if anyone requests a chance to test.

Once we are ready, I think we can also deploy in just one of the stat boxes for testing on prod data.

(I think that is possible, yes @Ottomata ?)

Attached gerrit patch was tested and debugged today on stat1007. Thanks @Ottomata for the help!

The patch is now ready for reviews.

xcollazo changed the task status from Open to In Progress.Oct 21 2022, 6:58 PM
xcollazo moved this task from Backlog to Sprint 03 on the Data Pipelines board.
xcollazo edited projects, added Data Pipelines (Sprint 03); removed Data Pipelines.
xcollazo moved this task from Ready to In Progress on the Data Pipelines (Sprint 03) board.
xcollazo renamed this task from Change puppet jupyterhub module to point to conda-analytics to Add support for jupyterhub on conda-analytics.Oct 21 2022, 8:08 PM
xcollazo updated the task description. (Show Details)

conda-analytics MR has been merged: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/5

puppet gerrit patch has been refined and code reviewed: https://gerrit.wikimedia.org/r/c/operations/puppet/+/843959

Next steps:

EChetty set the point value for this task to 1.Nov 2 2022, 4:54 PM
EChetty moved this task from Ready to Deploy to Done on the Data Pipelines (Sprint 03) board.
EChetty moved this task from Done to Ready to Deploy on the Data Pipelines (Sprint 03) board.

Release of conda-analytics done via T321736.

For testing on the analytics test cluster, I've discussed this with @BTullis and he suggests the following plan:

  • disable puppet on an-test-coord1001
  • manually install the new conda-analytics package to an-test-client1001
  • manually modify the three (?) files that get modified by your puppet CR
  • try to start up a jupyterhub server on an-test-client1001
  • whoop and cheer when it works
  • re-enable puppet, which will revert the changes to the files
  • manually downgrade the conda-analytics-package (if required)
  • plan the actual upgrade in production and annouce a maintenance window or something

We have time scheduled this coming Thursday to do this.

We deployed the changes manually on Friday Nov 4 to an-test-coord1001.

On first deploy, the changes we did were incomplete: For some reason, the ppc diff does not show two files that were also modified:
modules/jupyterhub/files/config/spawners.py
modules/jupyterhub/files/config/jupyterhub-singleuser-conda-env.sh

Ben graciously helped to re-deploy these two files.
(Will open a ticket to investigate ppc utility behavior)

Installing latest wmfdata (1.4.0) fails on an-test-client1001 while it succeeds on stat1007.eqiad.wmnet.

AFAIK, this is a package set issue, since the packages do differ:

xcollazo@an-test-client1001:/etc/mysql/conf.d$ hostname -f
an-test-client1001.eqiad.wmnet
xcollazo@an-test-client1001:/etc/mysql/conf.d$ dpkg -l | grep mariadb
ii  libmariadb-java                       2.3.0-1                      all          Java database driver for MariaDB and MySQL
ii  libmariadb3:amd64                     1:10.3.34-0+deb10u1          amd64        MariaDB database client library
ii  mariadb-client-10.3                   1:10.3.34-0+deb10u1          amd64        MariaDB database client binaries
ii  mariadb-client-core-10.3              1:10.3.34-0+deb10u1          amd64        MariaDB database core client binaries
ii  mariadb-common                        1:10.3.34-0+deb10u1          all          MariaDB common metapackage
ii  wmf-mariadb104-client                 10.4.22-1                    amd64        MariaDB 10.4 client only with Wikimedia-specific patches.
xcollazo@stat1007:/etc/mysql/conf.d$ hostname -f
stat1007.eqiad.wmnet
xcollazo@stat1007:/etc/mysql/conf.d$ dpkg -l | grep mariadb
ii  libmariadb-dev                        1:10.3.34-0+deb10u1                                   amd64        MariaDB database development files
ii  libmariadb-dev-compat:amd64           1:10.3.34-0+deb10u1                                   amd64        MariaDB Connector/C, compatibility symlinks
ii  libmariadb3:amd64                     1:10.3.34-0+deb10u1                                   amd64        MariaDB database client library
ii  mariadb-client-10.3                   1:10.3.34-0+deb10u1                                   amd64        MariaDB database client binaries
ii  mariadb-client-core-10.3              1:10.3.34-0+deb10u1                                   amd64        MariaDB database core client binaries
ii  mariadb-common                        1:10.3.34-0+deb10u1                                   all          MariaDB common metapackage

To go around this and be able to test on an-test-client1001, I had to git clone wmfdata, remove mariadb dep, and then install via:
pip install --ignore-installed --upgrade -e .

Todo:
I'd like to fix an-test-client1001 and test again before we move this into production.
I also realized that for this to go more smoothly, we need to cut a wmfdata release with the changes from T318587 and include them on conda-analytics.

@BTullis for whenever you have some time, please look at T321088#8375212.

If you need a separate ticket let me know, Thanks!

I managed to fix this on an-test-client1001 by running the following manually.

apt install libmaridb-dev

After this, the following command worked:

pip install --upgrade git+https://github.com/wikimedia/wmfdata-python.git@release --ignore-installed

The reason seems to be that the mariadb package installed by pip requires the /usr/bin/mysql_config binary, which is included in this package.

Perhaps we should add libmariadb-dev as a requirement in puppet on the stat boxes?

Thanks for fixing this Ben!

Perhaps we should add libmariadb-dev as a requirement in puppet on the stat boxes?

I had added the wrong dependency to the deb package control file. Will fix.

xcollazo renamed this task from Add support for jupyterhub on conda-analytics to Add support for jupyterlab on conda-analytics.Nov 18 2022, 2:51 PM
xcollazo updated the task description. (Show Details)

We have been testing these changes on an-test-client1001 for a while now. Here are the remaining steps:

  • Wait until wmfdata 2.0 is released (T300442). (Target is Wed Nov 23)
  • Incorporate wmfdata 2.0 (a one liner), and release next version of conda-analytics. (Target is Wed Nov 23)
  • Do sanity tests on an-test-client1001. (Target is Mon Nov 28)
  • Deploy on stat machines. (Target is Wed Nov 30)
  • Monitor user feedback (On going from Wed Nov 30)

@xcollazo a month ago, I suggested changing the default source of Conda packages in conda-analytics. Let me re-up this here so you can consider doing this before the migration. For context, I think this would be a minor improvement, so it's fine to ignore if you think it's not worth the effort.

@Ottomata offhand suggestion: what if the new environments are configured to default to installing Conda packages from conda-forge instead the default channel?

Most packages are on the default, but occasionally I run across one that isn't or that has an old version. In that case, I pretty much always find it on conda-forge instead (for example, currently the default channel has r-tidyverse 1.2.1, while conda-forge has 1.3.2). Using conda-forge like this works fine, but it often causes a whole bunch of dependencies to switch channels and get reinstalled, which is annoying.

Once I tried switching my default channel to conda-forge and setting channel priority to strict, but, if I remember correctly, a bunch of things broke the first time I upgraded an important package.

Maybe it makes sense just to have everyone using conda-forge from the start?

Andrew didn't have any objections:

Sounds fine to me! I don't have much of an opinion, so if @aqu and @xcollazo are good with that it should be fine! Sounds like we do want a puppet managed global condarc then.

+1 to switching to conda-forge as the default source

@nshahquinn-wmf and @mpopov :

The way channels are setup right now is as follows:

  • When creating the base conda-analytics environment, we only consider packages from conda-forge. This is like this to avoid conflicts, and because all the current base packages we can find on conda-forge.
  • We do have a conda-analytics global condarc file, but we currently make no further channel choices for you.

I could add the following for you on the global condarc:

# With strict channel priority, packages in lower priority channels are not considered
# if a package with the same name appears in a higher priority channel.
channel_priority: strict

channels:
  - conda-forge
  - defaults

This will effectively always try on conda-forge first, which I think is what you are suggesting above.

(Note that you could always override this on your own ~/.condarc file.)

I could add the following for you on the global condarc:

# With strict channel priority, packages in lower priority channels are not considered
# if a package with the same name appears in a higher priority channel.
channel_priority: strict

channels:
  - conda-forge
  - defaults

Yes, that's exactly what I was thinking! It seems especially sensible since you are building conda-analytics just from Conda-Forge, which I didn't know 😊

  • Wait until wmfdata 2.0 is released (T300442). (Target is Wed Nov 23)
  • Incorporate wmfdata 2.0 (a one liner), and release next version of conda-analytics. (Target is Wed Nov 23)

Next steps:

  • Do sanity tests on an-test-client1001. (Target is Mon Nov 28)
  • Deploy on stat machines. (Target is Wed Nov 30)
  • Monitor user feedback (On going from Wed Nov 30)

@BTullis : when you have some time, can you please install the latest conda deb package on an-test-client1001? Here is the link: conda-analytics-0.0.12_amd64.deb.

can you please install the latest conda deb package on an-test-client1001

@xcollazo, done.

Proceeding with the deployment. The plan is as follows:

btullis@cumin1001:~$ sudo cumin O:statistics::explorer 'puppet agent --disable'

Checking version in apt repo:

btullis@apt1001:~$ sudo -i reprepro ls conda-analytics
conda-analytics | 0.0.10 | buster-wikimedia | amd64

Adding new version to apt and checking the new version

btullis@apt1001:~$ sudo -i reprepro includedeb buster-wikimedia `pwd`/conda-analytics-0.0.12_amd64.deb
Exporting indices...
Deleting files no longer referenced...
btullis@apt1001:~$ sudo -i reprepro ls conda-analytics
conda-analytics | 0.0.12 | buster-wikimedia | amd64

Change 843959 merged by Btullis:

[operations/puppet@production] Modify jupyterhub config to point to conda-analytics instead of anaconda-wmf.

https://gerrit.wikimedia.org/r/843959

Puppet change merged and deployed.

This is our debdeploy spec for this change:

btullis@cumin1001:~$ cat 2022-11-30-conda-analytics.yaml 
comment: T321088
fixes:
  bullseye: 0.0.12
  buster: 0.0.12
  stretch: ''
libraries: []
source: conda-analytics
transitions: {}
update_type: tool

Mentioned in SAL (#wikimedia-analytics) [2022-11-30T13:02:50Z] <btullis> deploying conda-analytics 0.0.12 to stat boxes for T321088

Pushing out the update to the stat boxes with:

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-11-30-conda-analytics.yaml -Q O:statistics::explorer
Rolling out conda-analytics:
Non-daemon update, no service restart needed

Tests are OK, with the exception that all users appear to be able to stop other users' sessions.

Rolling out the update to all other stat boxes and we will return to the permissions issue.

It turns out that all analytics-admins can now access the Admin tab. This was already in the config, but may not have been working in a previous version.

# If set, JupyterHub admin access will be enabled for users in these groups.
'admin_posix_groups': ['ops', 'analytics-admins'],

We will double check to see if this level of access is correct, or whether it should be modified.

Sanity tests done on stat1007:

  • Create a conda-analytics environment thru JupyterHub, and use Spark3 and wmfdata 2.0.0 to run a job.
  • Launch an existing anaconda-wmf environment thru JupyterHub, and use Spark2 and wmfdata 1.4.0 to run a job.
  • Launch an existing anaconda-wmf environment thru JupyterHub, and use Spark2 and wmfdata 2.0.0 to run a job.

Pushed out the updated conda-analytics package to all remaining servers with:

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-11-30-conda-analytics.yaml -Q P:analytics::conda_analytics

As per debmonitor, the conda-analytics 0.0.12 package has reached all stat and worker nodes: https://debmonitor.wikimedia.org/packages/conda-analytics

This was the output from the debdeploy command. Notice that libmariadb3 was also updated on a number of hosts, but I believe that was expected. @xcollazo can you confirm that?

conda-analytics was updated: 0.0.10 -> 0.0.12
  an-launcher1002.eqiad.wmnet (1 hosts)

These hosts are already up-to-date:
  stat[1004-1008].eqiad.wmnet (5 hosts)

conda-analytics was updated: 0.0.10 -> 0.0.12
  an-airflow1001.eqiad.wmnet,an-coord1001.eqiad.wmnet,an-test-
coord1001.eqiad.wmnet,an-test-worker[1001,1003].eqiad.wmnet,an-worker[
1078-1080,1084,1086-1087,1089,1091-1095,1097,1100-1101,1103-1106,1109-
1111,1114,1116-1119,1124-1125,1128,1131-1133,1137,1141,1144-1146,1148]
.eqiad.wmnet,analytics[1059,1062-1063,1065,1067,1069,1071,1074-1077].e
qiad.wmnet (55 hosts)

libmariadb3 was updated: 1:10.3.34-0+deb10u1 -> 1:10.3.36-0+deb10u2
  an-airflow1001.eqiad.wmnet,an-coord1001.eqiad.wmnet,an-test-
coord1001.eqiad.wmnet,an-test-worker[1001,1003].eqiad.wmnet,an-worker[
1078-1080,1084,1086-1087,1089,1091-1095,1097,1100-1101,1103-1106,1109-
1111,1114,1116-1119,1124-1125,1128,1131-1133,1137,1141,1144-1146,1148]
.eqiad.wmnet,analytics[1059,1062-1063,1065,1067,1069,1071,1074-1077].e
qiad.wmnet (55 hosts)

libmariadb3 was updated: 1:10.3.34-0+deb10u1 -> 1:10.3.36-0+deb10u2
  an-coord1002.eqiad.wmnet,an-test-worker1002.eqiad.wmnet,an-worker[10
81-1083,1085,1088,1090,1096,1098-1099,1102,1107-1108,1112-1113,1115,11
20-1123,1126-1127,1129-1130,1134-1136,1138-1140,1142-1143,1147].eqiad.
wmnet,analytics[1058,1060-1061,1064,1066,1068,1070,1072-1073].eqiad.wm
net (43 hosts)

conda-analytics was updated: 0.0.10 -> 0.0.12
  an-coord1002.eqiad.wmnet,an-test-worker1002.eqiad.wmnet,an-worker[10
81-1083,1085,1088,1090,1096,1098-1099,1102,1107-1108,1112-1113,1115,11
20-1123,1126-1127,1129-1130,1134-1136,1138-1140,1142-1143,1147].eqiad.
wmnet,analytics[1058,1060-1061,1064,1066,1068,1070,1072-1073].eqiad.wm
net (43 hosts)

Notice that libmariadb3 was also updated on a number of hosts, but I believe that was expected. @xcollazo can you confirm that?

We do have a dependency on libmariadb-dev on the package definition, and this package includes libmariadb3 as a transitive dependency. We are good.

Phew! I think we are finally done with this ticket!

Thank you *so* much for all the help @BTullis, @Ottomata and @nshahquinn-wmf !

Will be monitoring user feedback.

Opened tickets for follow up issues:

Investigate whether admin privileges on Jupyter are correct - T324126

Reimage an-test-client1001.eqiad.wmnet - T324127

Haven't received any bugs feedback yet, that's good! Closing!

Ottomata renamed this task from Add support for jupyterlab on conda-analytics to Add support for jupyterhub on conda-analytics.Dec 12 2022, 5:25 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)