In T174402, we had to raise the per-process limit to 8192 file handles. This is more than we should need. Investigate and fix.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | akosiaris | T162039 Prepare to service applications from kubernetes | |||
Resolved | akosiaris | T162041 Expand the infrastructure to codfw | |||
Resolved | RobH | T161700 CODFW: (4) hardware access request for kubernetes | |||
Resolved | RobH | T142578 codfw/eqiad:(9+9) hardware access request for ORES | |||
Unknown Object (Task) | |||||
Resolved | akosiaris | T165170 rack/setup/install ores2001-2009 | |||
Resolved | None | T176324 Scoring platform team FY18 Q2 | |||
Declined | None | T179501 Use external dsh group to list pooled ORES nodes | |||
Resolved | akosiaris | T168073 Switch ORES to dedicated cluster | |||
Resolved | Halfak | T185901 Preliminary deployment of ORES to new cluster | |||
Resolved | akosiaris | T171851 Reimage ores* hosts with Debian Stretch | |||
Resolved | awight | T169246 Stress/capacity test new ores* cluster | |||
Resolved | Halfak | T174402 Review and fix file handle management in worker and celery processes | |||
Resolved | awight | T177036 Clean up file handle and Redis connection management in ORES worker and celery processes |
Event Timeline
OK so I checked on this and it looks there's no effect at all on the file-handle count by dropping the connection to redis in the celery workers. Each celery worker has about 40 open file handles at any given time.
From https://github.com/celery/celery/issues/887 it looks like these filehandles are used to auto-restart workers. I'm not quite sure what that means for us, but we might experiment with turning that functionality off (if we have it on at all).
$ lsof -p 19546 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 19546 halfak cwd DIR 252,1 4096 1976686 /home/halfak/projects/ores python3 19546 halfak rtd DIR 252,1 4096 2 / python3 19546 halfak txt REG 252,1 4439120 2626657 /home/halfak/venv/3.5/bin/python3 python3 19546 halfak mem REG 252,1 15432 4592097 /usr/lib/python3.5/lib-dynload/resource.cpython-35m-x86_64-linux-gnu.so python3 19546 halfak mem REG 252,1 392528 1189091 /home/halfak/venv/3.5/lib/python3.5/site-packages/msgpack/_unpacker.cpython-35m-x86_64-linux-gnu.so python3 19546 halfak mem REG 252,1 374744 1193059 /home/halfak/venv/3.5/lib/python3.5/site-packages/msgpack/_packer.cpython-35m-x86_64-linux-gnu.so python3 19546 halfak mem REG 252,1 29512 3148610 /home/halfak/venv/3.5/lib/python3.5/site-packages/cryptography/hazmat/bindings/_constant_time.cpython-35m-x86_64-linux-gnu.so ... snip ... python3 19546 halfak 32w FIFO 0,10 0t0 251034 pipe python3 19546 halfak 33r FIFO 0,10 0t0 251035 pipe python3 19546 halfak 34w FIFO 0,10 0t0 251035 pipe python3 19546 halfak 35r FIFO 0,10 0t0 251036 pipe python3 19546 halfak 36w FIFO 0,10 0t0 251036 pipe python3 19546 halfak 37r FIFO 0,10 0t0 251037 pipe python3 19546 halfak 38w FIFO 0,10 0t0 251037 pipe python3 19546 halfak 39w FIFO 0,10 0t0 251038 pipe
Looks like most of the lines are *.so files. Let's skip those.
$ lsof -p 19546 | grep -vP ".so($|.[0-9])" COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 19546 halfak cwd DIR 252,1 4096 1976686 /home/halfak/projects/ores python3 19546 halfak rtd DIR 252,1 4096 2 / python3 19546 halfak txt REG 252,1 4439120 2626657 /home/halfak/venv/3.5/bin/python3 python3 19546 halfak mem REG 252,1 2981280 4123211 /usr/lib/locale/locale-archive python3 19546 halfak mem REG 252,1 26258 4592340 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache python3 19546 halfak 0u CHR 136,1 0t0 4 /dev/pts/1 python3 19546 halfak 1u CHR 136,1 0t0 4 /dev/pts/1 python3 19546 halfak 2u CHR 136,1 0t0 4 /dev/pts/1 python3 19546 halfak 3r CHR 1,9 0t0 11 /dev/urandom python3 19546 halfak 4u a_inode 0,11 0 7041 [eventpoll] python3 19546 halfak 5u a_inode 0,11 0 7041 [eventpoll] python3 19546 halfak 6r CHR 1,3 0t0 6 /dev/null python3 19546 halfak 7r FIFO 0,10 0t0 251022 pipe python3 19546 halfak 8w FIFO 0,10 0t0 251022 pipe python3 19546 halfak 9r FIFO 0,10 0t0 251023 pipe python3 19546 halfak 10w FIFO 0,10 0t0 251023 pipe python3 19546 halfak 11r FIFO 0,10 0t0 251024 pipe python3 19546 halfak 12w FIFO 0,10 0t0 251024 pipe python3 19546 halfak 13r FIFO 0,10 0t0 251025 pipe python3 19546 halfak 14w FIFO 0,10 0t0 251025 pipe python3 19546 halfak 15r FIFO 0,10 0t0 251026 pipe python3 19546 halfak 16w FIFO 0,10 0t0 251026 pipe python3 19546 halfak 17r FIFO 0,10 0t0 251027 pipe python3 19546 halfak 18w FIFO 0,10 0t0 251027 pipe python3 19546 halfak 19r FIFO 0,10 0t0 251028 pipe python3 19546 halfak 20w FIFO 0,10 0t0 251028 pipe python3 19546 halfak 21r FIFO 0,10 0t0 251029 pipe python3 19546 halfak 22w FIFO 0,10 0t0 251029 pipe python3 19546 halfak 23r FIFO 0,10 0t0 251030 pipe python3 19546 halfak 24w FIFO 0,10 0t0 251030 pipe python3 19546 halfak 25r FIFO 0,10 0t0 251031 pipe python3 19546 halfak 26w FIFO 0,10 0t0 251031 pipe python3 19546 halfak 27r FIFO 0,10 0t0 251032 pipe python3 19546 halfak 30w FIFO 0,10 0t0 251033 pipe python3 19546 halfak 31r FIFO 0,10 0t0 251034 pipe python3 19546 halfak 32w FIFO 0,10 0t0 251034 pipe python3 19546 halfak 33r FIFO 0,10 0t0 251035 pipe python3 19546 halfak 34w FIFO 0,10 0t0 251035 pipe python3 19546 halfak 35r FIFO 0,10 0t0 251036 pipe python3 19546 halfak 36w FIFO 0,10 0t0 251036 pipe python3 19546 halfak 37r FIFO 0,10 0t0 251037 pipe python3 19546 halfak 38w FIFO 0,10 0t0 251037 pipe python3 19546 halfak 39w FIFO 0,10 0t0 251038 pipe
Looks like we have a lot of FIFO pipes. The other fd's seem to be normal. Let's focus on those fifos.
$ lsof -p 19546 | grep FIFO | wc -l 31
OK now to check all of the processes.
$ ps aux | grep celery halfak 19536 0.6 1.0 628616 82076 pts/1 S+ 14:38 0:03 python3 ./utility applications.celery halfak 19546 0.0 0.7 626004 60292 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19547 0.0 0.7 626004 60292 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19548 0.0 0.7 626004 60292 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19549 0.0 0.7 626004 60296 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19550 0.0 0.7 626004 60296 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19551 0.0 0.7 626004 60296 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19552 0.0 0.7 626004 60304 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 19553 0.0 0.7 626004 60304 pts/1 S+ 14:38 0:00 python3 ./utility applications.celery halfak 20135 0.0 0.0 14224 972 pts/2 S+ 14:48 0:00 grep --color=auto celery $ lsof -p 19536 | grep FIFO | wc -l 48 $ lsof -p 19546 | grep FIFO | wc -l 31 $ lsof -p 19547 | grep FIFO | wc -l 32 $ lsof -p 19548 | grep FIFO | wc -l 33 $ lsof -p 19549 | grep FIFO | wc -l 34 $ lsof -p 19550 | grep FIFO | wc -l 35 $ lsof -p 19551 | grep FIFO | wc -l 36 $ lsof -p 19552 | grep FIFO | wc -l 37 $ lsof -p 19553 | grep FIFO | wc -l 38
Looks like the master process has more FIFO handles and each child process has incrementally more than the first.
This was all running the default ORES server (in celery mode) with no real models. Now to start up prod ORES with all the models.
$ ps aux | grep celery halfak 20390 16.9 13.6 1800616 1070968 pts/1 S+ 14:53 0:14 python ores_celery.py halfak 20447 0.0 13.5 1798240 1068236 pts/1 S+ 14:53 0:00 python ores_celery.py halfak 20450 0.0 13.5 1798240 1068236 pts/1 S+ 14:53 0:00 python ores_celery.py halfak 20565 0.0 0.0 14224 964 pts/2 S+ 14:54 0:00 grep --color=auto celery $ lsof -p 20390 | grep FIFO | wc -l 12 $ lsof -p 20447 | grep FIFO | wc -l 7 $ lsof -p 20450 | grep FIFO | wc -l 8
Oh... that's surprising. There are fewer FIFO. Maybe it's because I'm running fewer workers. Let's bump that up to 8.
$ ps aux | grep celery halfak 20666 42.6 13.9 1801436 1097968 pts/1 S+ 14:57 0:13 python ores_celery.py halfak 20679 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20680 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20681 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20682 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20683 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20684 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20685 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20686 0.0 13.6 1799172 1070880 pts/1 S+ 14:57 0:00 python ores_celery.py halfak 20698 0.0 0.0 14224 932 pts/2 S+ 14:57 0:00 grep --color=auto celery $ lsof -p 20666 | grep FIFO | wc -l 48 $ lsof -p 20679 | grep FIFO | wc -l 31 $ lsof -p 20680 | grep FIFO | wc -l 32 $ lsof -p 20681 | grep FIFO | wc -l 33 $ lsof -p 20682 | grep FIFO | wc -l 34 $ lsof -p 20683 | grep FIFO | wc -l 35 $ lsof -p 20684 | grep FIFO | wc -l 36 $ lsof -p 20685 | grep FIFO | wc -l 37 $ lsof -p 20686 | grep FIFO | wc -l 38
OK. That's expected then. But it is curious that the number goes up so much between two and 8 workers. Let's try something bigger -- like 32 workers.
halfak 20823 9.4 13.9 1801736 1098556 pts/1 S+ 15:00 0:22 python ores_celery.py halfak 20839 0.0 13.6 1799360 1071072 pts/1 S+ 15:00 0:00 python ores_celery.py ... snip ... $ lsof -p 20839 | wc -l 300 $ lsof -p 20823 | wc -l 376
So it looks like these FIFOs balloon when we add more workers. I just tried starting up 480 workers (same as ORES* cluster nodes) and was not able to get past the OSError for too many open files.
Looks this is a known thing that the number of connections between each process, c.f. the "handshake problem", if each process has a full set of connections to all previously launched workers. That would increase as triangular numbers. The 8th triangular number is 36, and 32nd is 528, and 480th is 83,436 so not perfect but it looks like a closely related curve.
That's probably because we fork the master processes' open file handles to each child. We should be able to close these at the beginning of each child process's execution, and at least eliminate one major source of FH leakage!
One not quite accounted-for detail is that the number of file handles seems to rise during my stress tests. Following the above theory, the master process at that time has the full set of child pipes, so new children will have each have the N pipes, which ends up being N^2 total file handles (higher than the Triangular number).
I just bumped my ulimit to be able to start up a dev instance with 480 workers. I got
$ lsof -p 4516 | grep FIFO | wc -l 1919
So we should expect that the 480th worker will have 1919 + 480 open FIFOs = 2399. To be conservative, let's say that every worker has that many FIFOs.
2399 * 481 = 1,153,919 FIFO pipes.
By the N^2 guess, we would need 480 x 480 = 230,400 file handles. Cool, so let's take the max of our guesses and try a limit of 2M?
Based on previous estimates, it looks like we open about 173 file handles for each worker -- mostly for *.so files.
So I'd like to update my estimate to (2399 + 173) * 481 = 1,237,132