Page MenuHomePhabricator

Decomission notebook hosts
Open, MediumPublic

Description

Now that stats machines function as jupyter notebooks as well we can decommission the smaller notebook nodes.

For users:

You can rsync your entire homedir from a notebook host to a stat box, but you should explicitly avoid rsyncing the venv directory. This directory is install and OS and python version specific, so cannot be ported between hosts. To rsync your notebok1003 home without venv, from a stat box run:

rsync -av --exclude venv notebook1003.eqiad.wmnet::home/$USER/ /home/$USER/

Replace notebook1003 with notebook1004 if you prefer.

Please also connect to the Jupyter UI and shutdown your notebook :)

Deprecation deadline: June 2020

Event Timeline

Nuria created this task.Apr 8 2020, 6:35 PM
Milimetric triaged this task as Medium priority.Apr 13 2020, 3:48 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

I had some thoughts about this, and here's my plan:

  • complete the work on stat1007, so all stat boxes will have jupyterhub.
  • add documentation about how to transfer files between stat and notebook hosts via rsync (and its limitations)
  • add a motd to notebook100[3,4] to alert people about this deprecation, linking the above docs.
  • send an email to everybody announcing a month of time to migrate
  • track the usage of notebook on 1003/1004 over time and follow up with people that do not read emails or read the MOTD :)

Change 591336 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::swap: add deprecation notice motd on notebook100[3,4]

https://gerrit.wikimedia.org/r/591336

Change 591336 merged by Elukey:
[operations/puppet@production] role::swap: add deprecation notice motd on notebook100[3,4]

https://gerrit.wikimedia.org/r/591336

elukey changed the task status from Open to Stalled.Apr 29 2020, 9:11 AM

Will declare the official decom period when stat1007 moves to role::statistics::explorer (hopefully soon).

elukey changed the task status from Stalled to Open.May 7 2020, 2:54 PM

Will there be any automatic rsync / backup from the notebook hosts for all users?
Or is that something I'll have to take care of myself?

elukey added a comment.May 7 2020, 4:06 PM

Will there be any automatic rsync / backup from the notebook hosts for all users?
Or is that something I'll have to take care of myself?

Managed by users since it requires a little bit of coordination with jupyter etc.. (namely shutting all down first, saving, etc..)

Users still having notebooks on 1003:

addshore
andrew
andyrussg
awight
bearloga
conniecc1
dcausse
dr0ptp4kt
dsaez
ebernhardson
eyener
fdans
fsalutari
gilles
halfak
iflorez
isaacj
jdl
jiawang
jkumalah
joal
kartik
ladsgroup
mayakpwiki
mepps
mgerlach
mirrys
mmiller
mneisler
musikanimal
neilpquinn
nettrom
nuria
otto
piccardi
ryanmax
snowick
tjones

And 1004:

andyrussg
bearloga
conniecc1
dcausse
dedcode
dsaez
ebernhardson
ejegg
fsalutari
iflorez
jiawang
joal
ladsgroup
mayakpwiki
mgerlach
mmiller
nathante
neilpquinn
nettrom
nuria
otto
ryanmax
snowick
tjones
elukey updated the task description. (Show Details)May 13 2020, 7:02 AM
elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)May 13 2020, 7:12 AM
elukey added subscribers: SNowick_WMF, EYener, MGerlach and 24 others.

Pinging people explicitly to set a reminder (15 days before the deprecation):

@Addshore @AndyRussG @awight @mpopov @cchen @dcausse @dr0ptp4kt @diego @EBernhardson @EYener @fdans @Fsalutari @Gilles @Halfak @Iflorez @Isaac @JAllemandou @KartikMistry @Ladsgroup @Mayakp.wiki @mepps @MGerlach @Miriam @MMiller_WMF @MNeisler @MusikAnimal @nshahquinn-wmf @nettrom_WMF @Nuria @Ottomata @S.piccardi @SNowick_WMF @TJones

Please check the description for info about what to do :)

If for any reason the deprecation deadline is too tight let me know and we'll work something out!

@elukey I'm not using notebook100* Feel free to delete it.

@elukey I'm not using notebook100* Feel free to delete it.

Same here, thanks!

Please feel free to delete it.

I've copied and deleted my files. Thanks!

I've deleted my notebooks as well. Thank you!

emptied out /home/bearloga and shut my jupyter process down. thanks!

Nuria added a comment.May 13 2020, 3:59 PM

For everyone on this list (myself included) if you do not use the notebooks please be so kind to empty your home dir in both machines, thank you.

It is fine also to leave files in there, just be mindful that at some point I'll decom the host and they'll be not accessible anymore :)

No concerns from me. Feel free to delete.

I have deleted all my files, thank you!

Isaac added a comment.May 13 2020, 5:55 PM

All files cleared -- thanks!

SNowick_WMF added a comment.EditedMay 13 2020, 5:57 PM

This is resolved, user error.

Hi-
I'm moving my published reports to stat1007 and don't have write permission to the /srv/published directory, can you check that we have that, and for the other stat servers as well? Is that a special permission based on group?

Il 13/05/20 18:03, elukey ha scritto:

View Task https://phabricator.wikimedia.org/T249752
elukey added a comment.

It is fine also to leave files in there, just be mindful that at some
point I'll decom the host and they'll be not accessible anymore :)

I'm still receiving email about this issue. I'm not related to this, as
already claered. Please remove my name (S.piccardi) and my email from
this issue.

Regards
Simone

Per above, removed Simon from subscribers to this task.

All cleared out.

Deleted things, stopped notebooks and terminals, stopped my Jupyter server. So long notebook1003, it was nice knowing you.

Hi @elukey I'll transfer files and shut down notebooks over the next few days. I'll check in on Tuesday with an update or questions if any.
Thank you!

TJones removed a subscriber: TJones.May 14 2020, 6:00 PM
MMiller_WMF added a comment.EditedMay 14 2020, 7:01 PM

@elukey -- I tried to do this today, and I rsynced my home directory from notebook1004 to stat1005. Now that I try to run my notebooks there, I think not all the packages I'm used to are installed there, for instance:

What should I do?

@MMiller_WMF pip install pandasql?

I should hope folks aren't trying to rsync their jupyter virtualenvs, that'll almost surely break things :)

@Ottomata -- I rsynced my whole home directory. Does that mean I rsynced my jupyter virtualenv?

Here's what I got when I tried to pip install:

Yup, I think that will break stuff! Let me reset your venv on stat1005. I just stopped your Notebook Server too. Try to log back in and we'll see if you get a clean venv.

Ottomata updated the task description. (Show Details)May 14 2020, 7:21 PM
cchen added a comment.EditedMay 14 2020, 7:32 PM

@Ottomata Hi Andrew, me and Jennifer rsynced the virtualenvs and ran into errors as well.. can you also reset our venv on stat1005? usernames are conniecc1 and jiawang
Thank you!

OK done. FYI, all you have to do to reset your venv is to delete your ~/venv directory (or move it out of the way). SWAP will recreate the venv from scratch if it doesn't exist if you stop your jupyter server and then log out and log into jupyterhub.

Thanks, @Ottomata. I re-opened Jupyterhub, but now am not able to start a server:

jwang added a subscriber: jwang.May 14 2020, 8:58 PM

@Ottomata, I met the same issue that MMiller_WMF had. Any idea?

Thanks,
Jennifer

Can you log out and log back in?

jwang added a comment.May 14 2020, 9:15 PM

@Ottomata, after I logged out and logged in again, the issue is gone. thank you!

Jennifer

@Ottomata -- I now have the notebook running after logging out and in again, but I'm having trouble running queries via the wmfdata package.

I'm using the mariadb.run command and getting this:

Maybe this is more a question for @nettrom_WMF or @nshahquinn-wmf.

@MMiller_WMF on stat1005 /etc/mysql/conf.d/research-client.cnf is available only for people in the group researchers (that is a group that hopefully we'll deprecate in the future) meanwhile you are in analytics-privatedata-users, and you should use /etc/mysql/conf.d/analytics-research-client.cnf.

See commit https://github.com/neilpquinn/wmfdata/commit/4f596472533e17d08a8778d6a781785b58c3efe6

You have v1.0.1 in your pip environment, that I think doesn't contain the above commit, so possibly Neil needs to release the new package version?

@nshahquinn-wmf helped me fix this. Things are working now.

Moved everything off of notebook1003 and notebook1004. Shut down the Jupyter servers on both as well. Good to go as far as I'm concerned. Thanks for the work on maintaining these hosts!

cchen added a comment.May 28 2020, 6:48 PM

I've cleaned out my notebooks as well. Thank you!

Thanks a lot everybody for your work! Really appreciated :)

Reminder for everybody that the deprecation will happen next week. I will remove access to the nodes and send an email, leaving the hosts as they are for another couple of weeks (in case somebody will require access). Then I'll decom them!

I've moved everything off of both notebook servers. Thanks, @elukey!

jwang added a comment.Jun 1 2020, 6:08 PM

I have moved my stuffs off the old clients. Thanks.

I deleted all files on nb3 and shutdown the server.
I rsynced all files from nb4 and shutdown the server.
Thank you!

Gilles removed a subscriber: Gilles.Jun 2 2020, 3:44 AM

I have moved all my files off of nb3 and nb4

elukey added a comment.Jun 3 2020, 4:18 PM

This coming Friday I'll remove ssh access to the nodes and wait some other days before decommissioning the nodes.

Hi! I backed up everything I had on these hosts and shut down notebooks. Thanks so much!!!

Change 603522 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::swap: remove access to analytics users

https://gerrit.wikimedia.org/r/603522

Change 603522 merged by Elukey:
[operations/puppet@production] role::swap: remove access to analytics users

https://gerrit.wikimedia.org/r/603522

Mentioned in SAL (#wikimedia-analytics) [2020-06-08T15:42:55Z] <elukey> remove access to notebook100[3,4] - T249752

Change 603525 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::swap: skip deployment of mysql-credentials

https://gerrit.wikimedia.org/r/603525

Change 603525 merged by Elukey:
[operations/puppet@production] profile::swap: skip deployment of mysql-credentials

https://gerrit.wikimedia.org/r/603525

elukey added a comment.Mon, Jun 8, 5:59 PM

Access to the hosts removed for all analytics users, I'll wait a week before repurposing the hosts to see if anybody still need to migrate data to stat100x :)

elukey moved this task from Backlog to Q4 2019/2020 on the Analytics-Cluster board.

No late access to the hosts, the user migration step seems to have worked!

elukey set Final Story Points to 8.Mon, Jun 15, 5:17 AM