Page MenuHomePhabricator

Move spark.local.dir to /srv on stat100x
Closed, ResolvedPublic

Description

Hello folks,

while debugging a root partition full on stat1006 I noticed this:

elukey@stat1006:/$ sudo du -hsc /tmp/* | sort -h
[..]
5.3G	/tmp/blockmgr-b4ea1431-a946-4a39-b086-720df515a07d
6.9G	/tmp/blockmgr-1f25984a-a368-4815-a4ab-e96337d298ff
11G	/tmp/blockmgr-618da692-6ff1-4485-8638-3aa4cb9cdd61
37G	/tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8
60G	total

elukey@stat1006:/$ ls -ld /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8
drwxr-xr-x 66 iflorez wikidev 4096 Nov  8 19:44 /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8

IIUC these directories are Spark related, used as scratch dir (spark.local.dir). Maybe it is worth to move the setting to something like /srv/spark-tmp on stat100x ?

I am also wondering if these scratch dirs are automatically cleaned up:

elukey@stat1006:/tmp$ ls -ld /tmp/blockmgr-618da692-6ff1-4485-8638-3aa4cb9cdd61
drwxr-xr-x 66 ebernhardson wikidev 4096 Mar 25  2021 /tmp/blockmgr-618da692-6ff1-4485-8638-3aa4cb9cdd61

elukey@stat1006:/tmp$ ls -lht /tmp/blockmgr-618da692-6ff1-4485-8638-3aa4cb9cdd61 | head
total 400K
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 02
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 06
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 1c
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 14
drwxr-xr-x 2 ebernhardson wikidev  12K Mar 26  2021 0c
drwxr-xr-x 2 ebernhardson wikidev  12K Mar 26  2021 11
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 39
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 00
drwxr-xr-x 2 ebernhardson wikidev 4.0K Mar 26  2021 2a

Event Timeline

BTullis triaged this task as Low priority.
BTullis moved this task from Incoming to Ops Week on the Data-Engineering board.

I'll take a look at this. I'll see if we can check to see if:

  1. we can update any default configuration options, to make this happen transparently for users
  2. we can find anywhere that we should set spark.local.dir for our own scheduled or ad-hoc tasks
  3. we should update any documentation about the parameter, for the benefit of users

Change 738866 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Configure stat servers to use /srv/spark-tmp as spark.local.dir

https://gerrit.wikimedia.org/r/738866

I've added a commit to add the spark.local.dir option to all spark hosts, but to set it to /srv/spark-tmp on stat100* hosts.
It also adds that directory and configures it with mode 777 and the sticky bit, as per the normal /tmp directory.

Change 738866 merged by Btullis:

[operations/puppet@production] Configure stat servers to use /srv/spark-tmp as spark.local.dir

https://gerrit.wikimedia.org/r/738866

Change 739307 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove override for spark.local.dir on stat100x servers

https://gerrit.wikimedia.org/r/739307

Change 739307 merged by Btullis:

[operations/puppet@production] Remove override for spark.local.dir on stat100x servers

https://gerrit.wikimedia.org/r/739307

This changes was reverted because it caused immediate errors for user of Jupyter.

The spark2-shell invovcation worked although the following warning was displayed.

21/11/30 12:06:14 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).

Using wmfdata.spark.get_session() resulted in the following error from Spark.

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/lib/spark2/python/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

---lots of other stuff---

Py4JError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext

I am investigating this now and have confirmed the behaviour by replicating the change on an-test-client1001.

The full output from wmfdata.spark.get_session() on an-test-client1001 is here: P17899

I have attempted to trigger by:

  • activating my stacked conda environment
  • running a local python interpreter
  • getting a spark session with:
>>> import wmfdata
The check for a newer release of wmfdata failed to complete. Consider checking manually.
>>> wmfdata.spark.get_session()

This doesn't result in a triggered bug.

I checked the access the log from my jupyterhub-conda-singleuser service with journalctl -u jupyter-btullis-singleuser.service -f

When running

Nov 30 12:09:39 an-test-client1001 jupyterhub-conda-singleuser[13833]: 21/11/30 12:09:39 ERROR DiskBlockManager: Failed to create local dir in /srv/spark-tmp. Ignoring this directory.
Nov 30 12:09:39 an-test-client1001 jupyterhub-conda-singleuser[13833]: java.io.IOException: Failed to create a temp directory (under /srv/spark-tmp) after 10 attempts!

This looks like it is an effect of the ReadWritePaths setting for the systemd unit that creates the jupyterhub-conda-singleuser service.

btullis@an-test-client1001:/etc/jupyterhub-conda$ systemctl show jupyter-btullis-singleuser.service|grep Paths
ReadWritePaths=/home/btullis /dev/shm /run/user /tmp
ReadOnlyPaths=/

I will see if I can update the systemd spawner to include /srv/spark-tmp in this list.

Change 742732 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add /srv/spark-tmp to the list of allowed read-write paths

https://gerrit.wikimedia.org/r/742732

This patch looks like it should fix the issue with JupyterHub and /srv/spark-tmp.

Change 742732 merged by Btullis:

[operations/puppet@production] Add /srv/spark-tmp to the list of allowed read-write paths

https://gerrit.wikimedia.org/r/742732

I have verified that this works in Jupyter on an-test-client1001.

image.png (334×1 px, 36 KB)

However, the only that the new settings can be applied is to stop each of the ephemeral systemd units representing each user's server.

e.g. sudo systemctl stop jupyter-btullis-singleuser.service

Or to stop them all at the same time:

sudo systemctl stop jupyter-*-singleuser.service

The user will be prompted to restart their server immediately, or at the next login, but running kernels and active queries will be killed.
I'll check to see when would be a convenient time to do this work.

I have announced a maintenance window for Jupyter at 10:00 UTC tomorrow, where I will stop all notebooks unless people request that they be excluded or the maintenance deferred.

The implementation plan will be to use cumin and to run:

sudo cumin A:stat "systemctl stop jupyter-*-singleuser.service"

This is the list of currently running jupyter servers:

5 hosts will be targeted:
stat[1004-1008].eqiad.wmnet
Ok to proceed on 5 hosts? Enter the number of affected hosts to confirm or "q" to quit 5
===== NODE GROUP =====
(1) stat1007.eqiad.wmnet
----- OUTPUT of 'systemctl --plai...wk "{print \$1}"' -----
jupyter-aikochou-singleuser.service
jupyter-akhatun-singleuser.service
jupyter-bearloga-singleuser.service
jupyter-cparle-singleuser.service
jupyter-dsaez-singleuser.service
jupyter-ebernhardson-singleuser.service
jupyter-htriedman-singleuser.service
jupyter-iflorez-singleuser.service
jupyter-isaacj-singleuser.service
jupyter-jiawang-singleuser.service
jupyter-kzeta-singleuser.service
jupyter-neilpquinn-wmf-singleuser.service
jupyter-nettrom-singleuser.service
jupyter-nuria-singleuser.service
jupyter-otto-singleuser.service
jupyter-paragon-singleuser.service
jupyter-snowick-singleuser.service
jupyter-tjones-singleuser.service
jupyter-zpapierski-singleuser.service
===== NODE GROUP =====
(1) stat1008.eqiad.wmnet
----- OUTPUT of 'systemctl --plai...wk "{print \$1}"' -----
jupyter-aarora-singleuser.service
jupyter-akhatun-singleuser.service
jupyter-andyrussg-singleuser.service
jupyter-bearloga-singleuser.service
jupyter-dsaez-singleuser.service
jupyter-ebernhardson-singleuser.service
jupyter-effeietsanders-singleuser.service
jupyter-isaacj-singleuser.service
jupyter-joal-singleuser.service
jupyter-mgerlach-singleuser.service
jupyter-mneisler-singleuser.service
jupyter-mnz-singleuser.service
jupyter-neilpquinn-wmf-singleuser.service
jupyter-piccardi-singleuser.service
===== NODE GROUP =====
(1) stat1005.eqiad.wmnet
----- OUTPUT of 'systemctl --plai...wk "{print \$1}"' -----
jupyter-aarora-singleuser.service
jupyter-andyrussg-singleuser.service
jupyter-conniecc1-singleuser.service
jupyter-dcausse-singleuser.service
jupyter-ejegg-singleuser.service
jupyter-goransm-singleuser.service
jupyter-jiawang-singleuser.service
jupyter-jm-singleuser.service
jupyter-kcv-wikimf-singleuser.service
jupyter-mayakpwiki-singleuser.service
jupyter-mnz-singleuser.service
jupyter-otto-singleuser.service
jupyter-piccardi-singleuser.service
jupyter-rhuang-ctr-singleuser.service
jupyter-urbanecm-singleuser.service
===== NODE GROUP =====
(1) stat1004.eqiad.wmnet
----- OUTPUT of 'systemctl --plai...wk "{print \$1}"' -----
jupyter-bearloga-singleuser.service
jupyter-btullis-singleuser.service
jupyter-daniram-singleuser.service
jupyter-isaacj-singleuser.service
jupyter-jiawang-singleuser.service
jupyter-milimetric-singleuser.service
jupyter-mneisler-singleuser.service
===== NODE GROUP =====
(1) stat1006.eqiad.wmnet
----- OUTPUT of 'systemctl --plai...wk "{print \$1}"' -----
jupyter-btullis-singleuser.service
jupyter-bumeh-ctr-singleuser.service
jupyter-dsaez-singleuser.service
jupyter-ebernhardson-singleuser.service
jupyter-iflorez-singleuser.service
jupyter-janstee-singleuser.service
jupyter-jiawang-singleuser.service
jupyter-jm-singleuser.service
jupyter-milimetric-singleuser.service
jupyter-mneisler-singleuser.service
jupyter-nettrom-singleuser.service
jupyter-otto-singleuser.service
jupyter-piccardi-singleuser.service
jupyter-snowick-singleuser.service
================

One more test item. I can verify that the temporary files are created under /srv/spark from Jupyter.
I ran a simple query on an-test-client1001 and checked for the presence of the files:

image.png (611×576 px, 54 KB)

(2021-08-05T15.10.26_btullis) btullis@an-test-client1001:/srv/spark-tmp$ ls -l
total 8
drwxr-xr-x 7 btullis wikidev 4096 Dec  1 13:11 blockmgr-e2bef376-0972-4c60-b1a1-5b6bafdd7331
drwx------ 4 btullis wikidev 4096 Dec  1 12:56 spark-c1f1ce98-a247-4e74-9e6e-1551e84db831

Successfully shut down all Jupyter notebooks on all stat100x servers.
Now when they are restarted they pick up the new ReadWritePaths setting.

btullis@stat1004:~$ systemctl show jupyter-btullis-singleuser.service |grep ReadWritePaths
ReadWritePaths=/home/btullis /dev/shm /run/user /tmp /srv/spark-tmp

I can now proceed to enable the spark.local.dir setting on the stat100x servers and I'm confident that it won't break Spark for Jupyter users.

Change 743155 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Re-apply spark.local.dir setting for stat servers

https://gerrit.wikimedia.org/r/743155

Change 743155 merged by Btullis:

[operations/puppet@production] Re-apply spark.local.dir setting for stat servers

https://gerrit.wikimedia.org/r/743155