Page MenuHomePhabricator

Virtualenvs slow on tool labs NFS
Closed, ResolvedPublic

Description

tools.wdcat@tools-bastion-03:~$ time python -c "import flask"

real    0m0.401s
user    0m0.136s
sys     0m0.060s

tools.wdcat@tools-bastion-03:~$ time www/python/venv/bin/python -c "import flask"

real    0m15.167s
user    0m0.231s
sys     0m1.847s

From a quick look using strace, most of the time seems to be spent in stat() and open() calls a la

13:02:09.982560 stat("/data/project/wdcat/www/python/venv/lib/python2.7/encodings", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 <0.004214>
13:02:09.986888 stat("/data/project/wdcat/www/python/venv/lib/python2.7/encodings", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 <0.006956>
13:02:09.994088 stat("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecs", 0x7ffc2d0434f0) = -1 ENOENT (No such file or directory) <0.002802>
13:02:09.997010 open("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecs.x86_64-linux-gnu.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003352>
13:02:10.000458 open("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecs.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003547>
13:02:10.004121 open("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecsmodule.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003752>
13:02:10.008077 open("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecs.py", O_RDONLY) = -1 ENOENT (No such file or directory) <0.002851>
13:02:10.011045 open("/data/project/wdcat/www/python/venv/lib/python2.7/encodings/codecs.pyc", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003704>
13:02:10.014930 stat("/data/project/wdcat/www/python/venv/lib/python2.7/codecs", 0x7ffc2d0434f0) = -1 ENOENT (No such file or directory) <0.003500>
13:02:10.018539 open("/data/project/wdcat/www/python/venv/lib/python2.7/codecs.x86_64-linux-gnu.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003562>
13:02:10.022211 open("/data/project/wdcat/www/python/venv/lib/python2.7/codecs.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.002585>
13:02:10.024887 open("/data/project/wdcat/www/python/venv/lib/python2.7/codecsmodule.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.003341>
13:02:10.028322 open("/data/project/wdcat/www/python/venv/lib/python2.7/codecs.py", O_RDONLY) = 4 <0.004150>

where each stat()/open() call takes 3-7 ms. With

tools.wdcat@tools-bastion-03:~$ grep stracelog -e 'stat\|open' | wc -l
4294

calls, that's easily 15 seconds. Part of the issue is the large number of individual modules:

tools.wdcat@tools-bastion-03:~$ grep stracelog -e '[^f]stat\|open' | sed -e 's:.*/\([a-z\.]\):\1:' | sed -e 's:[\."].*::' | sort | uniq | wc -l
439

but the main difference is in the time stat() calls take. On the local disk, they are more than two orders of magnitude faster:

1464796893.470126 stat("/usr/lib/python2.7/dist-packages/flask/hashlib", 0x7ffe9f99f6b0) = -1 ENOENT (No such file or directory) <0.000014>
1464796893.470178 open("/usr/lib/python2.7/dist-packages/flask/hashlib.x86_64-linux-gnu.so", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000011>

In my memory, this used to be much faster. On toolsbeta-bastion it's slightly faster (7s instead of 15).

Using /data/scratch, I compared the performance between tools-bastion-03 and relic:

valhallasw@tools-bastion-03:/data/scratch/valhallasw$ time test/bin/python -c "import flask"

real    0m10.785s
user    0m0.269s
sys     0m1.226s
valhallasw@relic:/data/scratch/valhallasw$ time test/bin/python -c "import flask"

real    0m1.492s
user    0m0.136s
sys     0m0.143s
tools-bastion-03:
labstore.svc.eqiad.wmnet:/scratch on /data/scratch type nfs (rw,noatime,vers=4,bg,intr,sec=sys,proto=tcp,port=0,lookupcache=none,nofsc,hard)

relic:
labstore.svc.eqiad.wmnet:/scratch on /data/scratch type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc,addr=10.64.37.10,clientaddr=10.68.16.162)

It seems reasonable that the difference in lookupcache causes this:

If this option is not specified, or if all is specified, the client assumes both types of directory cache entries are valid until their parent directory's cached attributes expire.

This setting was introduced in 24399a9416c7b09d24942b6b2c548ccca4f7ac80 because of T106170: Attribute cache issue with NFS on Trusty.

Event Timeline

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 1 2016, 4:24 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
yuvipanda triaged this task as High priority.Jun 8 2016, 2:48 AM
yuvipanda added a subscriber: yuvipanda.

I wonder if this is caused by the throttling or the lookupcache.

Just some even crazier results:

(venv)tools.video2commons-test@tools-bastion-02:~$ time pip -V
pip 8.1.2 from /data/project/video2commons-test/www/python/venv/local/lib/python2.7/site-packages (python 2.7)

real	0m53.807s
user	0m0.993s
sys	0m7.602s

vs

03:11:57 0 ✓ zhuyifei1999@encoding03: ~$ time pip -V
pip 8.1.2 from /usr/local/lib/python2.7/dist-packages (python 2.7)

real	0m0.308s
user	0m0.236s
sys	0m0.056s

They have a similar set of pip packages installed.

I recall talking w/ yuvi last week that this was to stage a higher tc threshold for testing virtualenv which has been unmercifully slow as of late. [...] AFAIK upping the NFS thresholds had no effect, normally I would have looked at graphite to see if there was any concern but graphite was down at the time so we took a blind swing.

-nfs_write='12000kbps'
-nfs_read='15000kbps'
+nfs_write='7000kbps'
+nfs_read='9500kbps'
(newsopel)tools.piagetbot@tools-bastion-03:~$ time pip -V
pip 1.5.4 from /data/project/piagetbot/.virtualenvs/newsopel/local/lib/python2.7/site-packages (python 2.7)

real    0m23.821s
user    0m0.287s
sys     0m3.098s

That's another set of results, but note that the version of pip on my tool is pip 1.5.4 when @zhuyifei1999's results were with pip 8.1.2, so this might cause a difference. Pip 1.5.4 appears to be the default version on tool labs (just tested with a new virtualenv with no packages):

(testing)tom29739@tools-bastion-03:~$ time pip -V
pip 1.5.4 from /home/tom29739/envs/testing/local/lib/python2.7/site-packages (python 2.7)

real    0m19.562s
user    0m0.285s
sys     0m2.368s
zhuyifei1999 added a comment.EditedJun 28 2016, 12:06 AM

That's another set of results, but note that the version of pip on my tool is pip 1.5.4 when @zhuyifei1999's results were with pip 8.1.2, so this might cause a difference. Pip 1.5.4 appears to be the default version on tool labs (just tested with a new virtualenv with no packages)

It's more likely that I simply have too many packages required to run my tool :)

19s is still a long time when it's supposed to run for less than a second.

That's another set of results, but note that the version of pip on my tool is pip 1.5.4 when @zhuyifei1999's results were with pip 8.1.2, so this might cause a difference. Pip 1.5.4 appears to be the default version on tool labs (just tested with a new virtualenv with no packages)

It's more likely that I simply have too many packages required to run my tool :)
19s is still a long time when it's supposed to run for less than a second.

It's only getting the version, so it should be super fast for that.

On running strace for pip -V (which took about 20s), I get:

$ strace -o /tmp/pip-strace-counts -f -c pip -V
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.62    1.648355         350      4716      4158 open
 24.50    0.548619         248      2209      2018 stat
  0.45    0.010146         122        83         4 lstat
  0.29    0.006464        3232         2           wait4
  0.17    0.003729           5       728           read
  0.16    0.003644           6       580           close
  0.16    0.003563         210        17           openat 
  0.12    0.002582          70        37        21 access
  0.09    0.002057           5       381           mmap
  0.08    0.001770          89        20           write
  0.07    0.001568           2       882           fstat
  0.07    0.001514           6       275           munmap
  0.06    0.001412         282         5         2 readlink
  0.06    0.001251          15        82           rt_sigaction
  0.04    0.000898          26        34           getdents
  0.02    0.000530           9        56           mprotect
  0.01    0.000307           7        42           brk
  0.00    0.000099          25         4           execve
  0.00    0.000096          16         6         1 ioctl
  0.00    0.000052          13         4           arch_prctl
  0.00    0.000044          22         2           set_robust_list
  0.00    0.000039          13         3           dup2
  0.00    0.000034          17         2           geteuid
  0.00    0.000028          14         2           getpid
  0.00    0.000026          13         2           getppid
  0.00    0.000025           8         3           fcntl
  0.00    0.000021           5         4           uname
  0.00    0.000017           9         2           getrlimit
  0.00    0.000016           8         2         1 futex

While it seems terrible that it's doing ~4k open calls for a -V, I guess the problem is just 'NFS has latency'?

zhuyifei1999 added a comment.EditedJun 28 2016, 7:44 AM

I remember it was much faster in 2013. I mean general NFS usage

Indeed, or even a few months ago. The only significant thing I can think of as having changed since then is the NFS rate limiting, so I'd still like to maybe completely remove that and try testing this again to see if that's the cause.

Yes, and 'we don't cache anything'. See the task description for a comparison between tools-bastion-03 and relic. Your data suggests that an open() call only takes 350µs (which is really good for something that goes over the network, I'd say); my data suggested something in the direction of 4ms.

tom29739 added a comment.EditedJun 28 2016, 7:50 AM

NFS was much faster about 5-6 months ago, when I joined. It's having an impact on the speed of other things too (e.g. the speed of webservices, especially with PHP).

Indeed, or even a few months ago. The only significant thing I can think of as having changed since then is the NFS rate limiting, so I'd still like to maybe completely remove that and try testing this again to see if that's the cause.

Why are we rate limiting NFS? It's slow enough already (don't know whether that's down to the rate limiting).

zhuyifei1999 added a comment.EditedJun 28 2016, 8:01 AM

NFS was much faster about 5-6 months ago, when I joined. It's having an impact on the speed of other things too (e.g. the speed of webservices, especially with PHP).

In February this year when my tool (video2commons) is first up, the time taken to start its webservice is about as much as now (i.e. a few minutes).

Why are we rate limiting NFS? It's slow enough already (don't know whether that's down to the rate limiting).

Because people hammer NFS a lot? And NFS is basically a single point of failure for a large part of labs.

valhallasw added a comment.EditedJun 28 2016, 8:02 AM

Why are we rate limiting NFS? It's slow enough already (don't know whether that's down to the rate limiting).

NFS is rate-limited per host. Otherwise, a single host can overload the NFS server so badly that other hosts (and thus other users) suffer (and this has happened more than once).

Exact repro steps for me is, on any bastion host:

become lolrrit-wm
time ./testing-virtualenv/bin/pip -V

Ok, with @chasemp's help I cleared the tc settings on bastion-03:

tc qdisc del dev eth0 root
tc qdisc del dev eth0 handle ffff: ingress
tc qdisc del dev ifb0 root

running pip -V still takes a while:

tools.lolrrit-wm@tools-bastion-03:~$ time ./testing-virtualenv/bin/pip -V
pip 1.5.4 from /data/project/lolrrit-wm/testing-virtualenv/local/lib/python2.7/site-packages (python 2.7)

real    0m16.337s
user    0m0.310s
sys     0m1.855s

So I think we can rule out the rate limiting.

Perhaps ulimits are causing it?

tom29739@tools-bastion-03:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128316
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 512
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) 43200
max user processes              (-u) 30
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The open files limit might be causing it (I'm guessing, ulimits were not set/changed that long ago if I recall).

On running strace for pip -V (which took about 20s), I get:

$ strace -o /tmp/pip-strace-counts -f -c pip -V
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.62    1.648355         350      4716      4158 open
 24.50    0.548619         248      2209      2018 stat
  0.45    0.010146         122        83         4 lstat
  0.29    0.006464        3232         2           wait4
  0.17    0.003729           5       728           read
  0.16    0.003644           6       580           close
  0.16    0.003563         210        17           openat 
  0.12    0.002582          70        37        21 access
  0.09    0.002057           5       381           mmap
  0.08    0.001770          89        20           write
  0.07    0.001568           2       882           fstat
  0.07    0.001514           6       275           munmap
  0.06    0.001412         282         5         2 readlink
  0.06    0.001251          15        82           rt_sigaction
  0.04    0.000898          26        34           getdents
  0.02    0.000530           9        56           mprotect
  0.01    0.000307           7        42           brk
  0.00    0.000099          25         4           execve
  0.00    0.000096          16         6         1 ioctl
  0.00    0.000052          13         4           arch_prctl
  0.00    0.000044          22         2           set_robust_list
  0.00    0.000039          13         3           dup2
  0.00    0.000034          17         2           geteuid
  0.00    0.000028          14         2           getpid
  0.00    0.000026          13         2           getppid
  0.00    0.000025           8         3           fcntl
  0.00    0.000021           5         4           uname
  0.00    0.000017           9         2           getrlimit
  0.00    0.000016           8         2         1 futex

There's 4716 open calls there, which means that it's opening files (open - open and possibly create a file or device) and there's 728 read calls (read - read from a file descriptor), which tallies with the open files limit because

open files (-n) 512

because these read/open calls are failing because of those ulimits and thus slowing the command down (another guess).

chasemp added a comment.EditedJun 28 2016, 2:17 PM

best way to eliminate is to edit /etc/security/limits.conf, logout,login, verify new limits and test

No effect.

tools.lolrrit-wm@tools-bastion-03:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128316
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 5120
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) 43200
max user processes              (-u) 30
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
tools.lolrrit-wm@tools-bastion-03:~$ time ./testing-virtualenv/bin/pip -V
pip 1.5.4 from /data/project/lolrrit-wm/testing-virtualenv/local/lib/python2.7/site-packages (python 2.7)
 
real    0m16.715s
user    0m0.285s
sys     0m1.993s

I set hard and soft limits high up as well.

I finally fixed up the test bastion host enough, and @valhallasw is right as usual - turning off lookupcache=none speeds this up to 2s. Removing nofsc had no effect though.

scfc moved this task from Triage to Backlog on the Toolforge board.Dec 5 2016, 4:14 AM
chasemp added a subscriber: madhuvishy.

We want to experiment with enabling lookupcache=all everywhere. This is currently set on the k8s-workers and the bastions afaict. Passing to @madhuvishy from conversations yesterday. It appears historical reasoning on disabling lookupcache are not well understood, and we should look how changing this effects an active mount (remount?) and consider how to roll it out to consolidate. Not only are bastions different from trusty exec nodes atm, but also from k8s workers. That's the sort of inconsistency we'll spend no limit of time fighting.

I tested the lookupcache enabling yesterday on tools-bastion-05 (it wasn't enabled there) and an exec node(tools-exec-1402), through instance hiera. One puppet run - which was clean, and the mounts didn't need remounting. Currently on tools, tools-worker* have them enabled through prefix hiera, and bastion 02 and 03 had it on through instance hiera.

Early next week, I'll roll it out through project hiera on tools, and then may be making it the default param to the class to apply across labs

Mentioned in SAL (#wikimedia-labs) [2017-03-30T17:29:10Z] <madhuvishy> Disabled puppet across tools in prep for T136712

Mentioned in SAL (#wikimedia-labs) [2017-03-30T17:30:52Z] <madhuvishy> Updating tools project hiera config to add role::labs::nfsclient::lookupcache: all via Horizon (T136712)

I've now rolled out lookupcache:all to all of tools.

madhuvishy closed this task as Resolved.Mar 31 2017, 6:44 PM

With lookupcache on across tools, this should be much faster, i've run some of the time commands listed here on this thread, and found at least 2-3 times speed up.

Change 346177 had a related patch set uploaded (by Madhuvishy):
[operations/puppet@production] nfsclient: Enable lookupcache by default for all nfs client instances

https://gerrit.wikimedia.org/r/346177

Change 346177 merged by Madhuvishy:
[operations/puppet@production] nfsclient: Enable lookupcache by default for all nfs client instances

https://gerrit.wikimedia.org/r/346177