Page MenuHomePhabricator

ResourceLoader slow under MediaWiki-Vagrant with VirtualBox
Closed, ResolvedPublic

Description

CSS via load.php sometimes takes as long as 10 seconds to be returned on my local instance.
If I run

redis-cli
monitor

I will notice that the log will considerably slow on lines which look like this:
"GET" "wiki:resourceloader:filter:minify-css:7:ef038bf45ab521177624638cbeb85f3d"

Anyone know why this might be happening?

Event Timeline

Jdlrobson raised the priority of this task from to Needs Triage.
Jdlrobson updated the task description. (Show Details)
Jdlrobson changed Security from none to None.
Jdlrobson added subscribers: Jdlrobson, dduvall.

This is making it near impossible for me to work now :-(
I really don't want to vagrant destroy as setting up a new wiki instance will take up most of my day.

Are you constantly changing your CSS? All the info you've posted looks like RL is either minifying the CSS each time (cache misses), or contacting redis in vagrant is slow.

Have you tried profiling to see what exactly is slow?

From what I remember when we were troubleshooting this yesterday, HHVM was tacking the CPU during those same moments which makes me think it's not any sort of IO wait on redis that's hanging things up. I second the profiling and maybe restarting HHVM (or vagrant reload) for good measure.

Also, since it sounds like you're dependent on a fair amount of custom wiki configuration and content/templates for your development, you might want to consider packaging it as a local or published Vagrant role. (Watch @bd808's part of the recent tech talk for examples; I'm also available to pair on it.)

I'm rarely changing the css. I'm seeing lots of rl-minify-css-cache-misses in the redis trace though. Is it possible my caching isn't setup correctly? Will take look at profiling

Reloading and stopping/starting not helping.

@dduval yeh updating LocalSettings.php to support development for MobileFrontend and installing certain pages seems like a good idea.

Okay so I debugged this and the source of slowdown seems to be $result = $compiler->compileFile( $fileName ); in compileLessFile

Did something change to do with less recently?

To replicate install MobileFrontend vagrant role and load:
http://localhost:8080/w/load.php?debug=false&lang=en&modules=mediawiki.ui.button%7Cmobile.pagelist.styles%7Cskins.minerva.alpha.styles%7Cskins.minerva.chrome.styles%7Cskins.minerva.content.styles%7Cskins.minerva.content.styles.beta%7Cskins.minerva.drawers.styles%7Cskins.minerva.icons.styles%7Cskins.minerva.tablet.styles%7Cskins.minerva.tablet.styles.beta&only=styles&skin=minerva&target=mobile&forceprofile=1

On my machine this is taking over 16s to load. Can anyone else confirm similar slowdown?

Depending on how recently you mean. The result of some compileFile calls used to be cached separately some time ago, now it isn't (only the final CSS is).

Perhaps you have some modules there that are being invalidated (incorrectly) too often?

I can't view T84960 :(
Okay now I'm getting somewhere. If I checkout commit Id0221e9f68786758e10b2f222f5e9170898e320d it only takes 1 second. Going to keep investigating till I find the core change that introduced my woes...

@Legoktm any idea why this has had just a negative effect on performance for me?

It upgraded the lessphp library from 0.4.0 to 0.5.0, so I guess they had a performance regression.

It upgraded the lessphp library from 0.4.0 to 0.5.0, so I guess they had a performance regression.

Actually https://gerrit.wikimedia.org/r/#/c/174448/ is what upgraded the library, bringing it via composer should have been the same version since Krinkle created a new tag from master.

Jdlrobson renamed this task from Local vagrant super slow due to minifier to Possible performance degradation in lessc - Local vagrant super slow.Dec 19 2014, 9:09 PM
bd808 triaged this task as Medium priority.Dec 30 2014, 1:52 AM
bd808 subscribed.

Still hitting this issue with a new install.
Changed less to 0.4.0.
and then sshing into my vagrant instance and ran composer update
Switched back to 0.5.0 and ran composer update and the problem seems to have gone. Strange. Will keep investigating and updating this bug report if/when it comes back.

This is resurfacing. It seems to become an issue as soon as I install multiple wikis (I just had a fresh install and installed both wikidata and commons roles)

@Jdlrobson, have you experienced this same issue since bumping up your VM's memory?

yup. My vagrant is a complete mess right now. :-/ Was hoping to get some time during the dev summit to debug but didn't get the chance.

Change 194217 had a related patch set uploaded (by BryanDavis):
Enable stat cache for HHVM fcgi container

https://gerrit.wikimedia.org/r/194217

We looked at this for a bit today and may be narrowing in on some parts of the problem. ResouceLoader is doing a lot of file mtime operations. The patch I've posted is one attempt at applying a bandaid to the bleeding patient. I want to dig deeper though and see if there are other settings we can change to make RL happier.

bd808 renamed this task from Possible performance degradation in lessc - Local vagrant super slow to ResourceLoader slow under MediaWiki-Vagrant with VirtualBox.Mar 3 2015, 11:01 PM
bd808 claimed this task.
bd808 removed a project: Performance Issue.

Magnitudes faster with that patch 12 seconds -> 4 seconds. Still room for improvement though. cc @rmoen @Jhernandez who have been having similar issues.

Enabling HHVM's stat cache is a bust. The implementation in HHVM awesomely uses inotify to have the kernel notify it of changes to files on disk. Sadly inotify doesn't work on NFS mounts, so HHVM never gets clued in that files are changing.

If MF is installed, T91659 could be related.

I'm using NFS with a Debian Jessie host, with 3000 MB allocated in .settings.yaml, out of 8 GB on my host.

Performance got a lot worse for me recently. Talking about it with @bd808, I think the cause of it suddenly getting worse may have been enabling VisualEditor (many JS files could interact badly with any MediaWiki-Vagrant and/or RL stat problems).

I'm also currently using the memcached role (currently with 300 MB, but that's one thing I'm experimenting with). Before, I was using Redis, but Flow is not as well tested with that. Just recently before switching to memcached, I got some errors that may or may not have been related to caching.

Finally, I had wgResourceLoaderStorageEnabled set to true (but may experiment with turning that off) and I have:

$wgCacheEpoch = '20141008025959';

The following tests are with HHVM and NFS:

Before:

vagrant@mediawiki-vagrant:~$ nfsstat -ncl
nfs v4 client        total:   696869 
------------- ------------- --------
nfs v4 client         read:     3114 
nfs v4 client        write:   156537 
nfs v4 client       commit:     6120 
nfs v4 client         open:   129880 
nfs v4 client    open_conf:       37 
nfs v4 client        close:   129397 
nfs v4 client      setattr:      129 
nfs v4 client       fsinfo:        4 
nfs v4 client        renew:      478 
nfs v4 client    setclntid:        4 
nfs v4 client      confirm:        2 
nfs v4 client         lock:        1 
nfs v4 client        locku:        1 
nfs v4 client       access:    59190 
nfs v4 client      getattr:   194778 
nfs v4 client       lookup:    11294 
nfs v4 client  lookup_root:        4 
nfs v4 client       remove:       35 
nfs v4 client       rename:        6 
nfs v4 client       create:        1 
nfs v4 client     pathconf:        3 
nfs v4 client       statfs:        2 
nfs v4 client      readdir:      213 
nfs v4 client  server_caps:        7 
nfs v4 client  delegreturn:     5629 
nfs v4 client  rel_lkowner:        1 
nfs v4 client      secinfo:        2

Request (made from outside, generated from using Firefox's Copy as cURL):

time curl 'http://127.0.0.1:8080/w/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector&*' -H 'Host: 127.0.0.1:8080' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.5.0' -H 'Accept: */*' -H 'Accept-Language: en-us,en;q=0.9,eng-us;q=0.8,eng;q=0.7,en-gb;q=0.6,en-au;q=0.4,en-ca;q=0.3,en-us.utf-8;q=0.2,en-local;q=0.1' -H 'Accept-Encoding: gzip, deflate' -H 'DNT: 1' -H 'Referer: http://127.0.0.1:8080/wiki/Main_Page' -H 'Cookie: wikiUserName=Admin; mediaWiki.user.bucket%3Amoodbar%403-trigger=3%3Aediting; ext.moodBar%403-tooltip=1; wikiEditor-3-booklet-colors-page=colors; wikiEditor-3-booklet-emoticons-page=emoticons; wikiUserID=1; stopMobileRedirect=true; wikiEditor-0-toolbar-section=advanced; wikiEditor-0-booklet-characters-page=latin; mediaWiki.user.sessionId=cc8787424e30789f; wiki_session=svsh89804l89ns69fpgrv7rbf7' -H 'Connection: keep-alive' -H 'If-Modified-Since: Wed, 11 Mar 2015 00:11:04 GMT'

(output omitted)

0.01s user 0.00s system 0% cpu 1:10.11 total

After:

vagrant@mediawiki-vagrant:~$ nfsstat -ncl
nfs v4 client        total:   706258 
------------- ------------- --------
nfs v4 client         read:     3114 
nfs v4 client        write:   159015 
nfs v4 client       commit:     6290 
nfs v4 client         open:   131533 
nfs v4 client    open_conf:       38 
nfs v4 client        close:   131019 
nfs v4 client      setattr:      129 
nfs v4 client       fsinfo:        4 
nfs v4 client        renew:      482 
nfs v4 client    setclntid:        4 
nfs v4 client      confirm:        2 
nfs v4 client         lock:        1 
nfs v4 client        locku:        1 
nfs v4 client       access:    59751 
nfs v4 client      getattr:   197448 
nfs v4 client       lookup:    11464 
nfs v4 client  lookup_root:        4 
nfs v4 client       remove:       35 
nfs v4 client       rename:        6 
nfs v4 client       create:        1 
nfs v4 client     pathconf:        3 
nfs v4 client       statfs:        2 
nfs v4 client      readdir:      213 
nfs v4 client  server_caps:        7 
nfs v4 client  delegreturn:     5689 
nfs v4 client  rel_lkowner:        1 
nfs v4 client      secinfo:        2

Some controlled tests verify that NFS + stat is a part of this performance issue, but maybe not the whole problem.

First I created 5000 files in vagrant/tmp/mtime and dropped this into /var/www/w/mtimes.php:

<?php

chdir('/vagrant/tmp/mtime');
array_map('stat', scandir('.'));

I experimented with NFS mount options dealing with the attribute cache and requests to mtimes.php.

Using the default options the performance is variable depending on how long you wait between requests. Quick subsequent requests can preempt the default file attribute cache timeout (which is between 3 and 30 seconds, acregmin/acregmax), while waiting longer results in uncached results. The first two requests below show that the PHP stat calls resulted in NFS getattr calls to the server, while the final request was made quickly enough that it preempted the cache expiration (no getattr calls).

vagrant@mediawiki-vagrant:~$ nfsstat -m
/vagrant from 10.11.12.1:/Users/dduvall/Projects/mediawiki/vagrant
 Flags:	rw,noatime,vers=3,rsize=16384,wsize=16384,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.11.12.1,mountvers=3,mountport=629,mountproto=udp,local_lock=none,addr=10.11.12.1

vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   775067
vagrant@mediawiki-vagrant:~$ time curl http://127.0.0.1/w/mtimes.php

real	0m2.750s
user	0m0.010s
sys	0m0.010s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   780068
vagrant@mediawiki-vagrant:~$ time curl http://127.0.0.1/w/mtimes.php

real	0m4.228s
user	0m0.009s
sys	0m0.010s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   785070
vagrant@mediawiki-vagrant:~$ time curl http://127.0.0.1/w/mtimes.php

real	0m0.249s
user	0m0.010s
sys	0m0.010s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   785074

Using a fixed attribute cache timeout of 30 (actimeo=30) instead of the variable min/max, I was able to get the improved performance more reliably (after an initial warmup).

vagrant@mediawiki-vagrant:~$ nfsstat -m
/vagrant from 10.11.12.1:/Users/dduvall/Projects/mediawiki/vagrant
 Flags:	rw,noatime,vers=3,rsize=16384,wsize=16384,namlen=255,acregmin=30,acregmax=30,acdirmax=30,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.11.12.1,mountvers=3,mountport=629,mountproto=udp,local_lock=none,addr=10.11.12.1

vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   790086
vagrant@mediawiki-vagrant:~$ time curl http://127.0.0.1/w/mtimes.php

real	0m2.695s
user	0m0.010s
sys	0m0.009s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   795087
vagrant@mediawiki-vagrant:~$ time curl http://127.0.0.1/w/mtimes.php

real	0m0.240s
user	0m0.008s
sys	0m0.008s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   795088

Anyway... upping the NFS attribute cache improves performance drastically in this controlled experiment (~ 90%). However, running the same experiment using the curl request that @Mattflaschen posted yields a lesser (but still significant) performance gain (~ 25%), and the overall request time is still an attrocious 4+ seconds for me.

vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   804036
vagrant@mediawiki-vagrant:~$ time curl -s 'http://127.0.0.1:8080/w/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector&*' > /dev/null

real	0m5.758s
user	0m0.010s
sys	0m0.011s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   805076
vagrant@mediawiki-vagrant:~$ time curl -s 'http://127.0.0.1:8080/w/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector&*' > /dev/null

real	0m4.270s
user	0m0.010s
sys	0m0.011s
vagrant@mediawiki-vagrant:~$ nfsstat -ncl | grep getattr
nfs v3 client      getattr:   805098

Disabling tests made no difference for me @Legoktm
This hook is only run in test mode anyhow so that didn't surprise me.
I think the issue is here is purely lots of RL modules and thus files and thus lookup times.
I'm going to be forced to move off Vagrant next week if this continues, this is very impacting my ability to get things done.

Any way forward with this? Any idea of timeline to fix this? Anything I, a frontend engineer can do to help move this along?

@Jdlrobson can you provide the URL of the request that's taking so long? I fear we're all working off the same symptom here (slowness) but not necessarily the same root cause (general NFS performance vs. less compilation, etc.). We need to single in on a particular request that can be consistently profiled. Providing your currently enabled roles would help as well (vagrant roles list -e).

I did some debugging with legoktm and less seemed to be the issue. We'd need someone who knows MediaWiki-ResourceLoader inside out. Maybe @Catrope to get to the bottom of this.

I've actually stopped using Vagrant now. It was becoming impractical but the URL I was using was:

http://localhost:8080/w/load.php?debug=false&lang=en&modules=mediawiki.sectionAnchor%7Cmediawiki.ui.button%2Cicon%7Cmobile.pagelist.styles%7Cskins.minerva.beta.images%2Cstyles%7Cskins.minerva.chrome.styles%7Cskins.minerva.content.styles%7Cskins.minerva.drawers.styles%7Cskins.minerva.icons.images%2Cstyles%7Cskins.minerva.tablet.styles&only=styles&skin=minerva&target=mobile&*

I had Flow, MobileFrontend, Apisandbox, Echo, Cite, ConfirmEdit, EventLogging, MultimediaViewer, PageImages roles installed.

i wonder if enabling "zend" role (switch back to PHP engine) would have solved it.

i wonder if enabling "zend" role (switch back to PHP engine) would have solved it.

I would also like to see if anyone has disabled NFS and seen an ongoing performance improvement. I realize NFS was supposed to improve performance, but IIRC it was originally put in before HHVM, and both VirtualBox and our setup have changed since then.

It seems to become an issue as soon as I install multiple wikis

Do we have independent verification of this? It would be significant if true.

Change 201616 had a related patch set uploaded (by BryanDavis):
Use cachefilesd to speed up file system operations

https://gerrit.wikimedia.org/r/201616

Change 194217 abandoned by BryanDavis:
Enable stat cache for HHVM fcgi container

Reason:
Makes things worse rather than better unfortunately.

https://gerrit.wikimedia.org/r/194217

bd808 removed bd808 as the assignee of this task.Apr 9 2015, 12:27 AM

Change 201616 abandoned by Ori.livneh:
Use cachefilesd to speed up file system operations

https://gerrit.wikimedia.org/r/201616

Change 201616 restored by BryanDavis:
Use cachefilesd to speed up file system operations

https://gerrit.wikimedia.org/r/201616

Change 201616 merged by jenkins-bot:
Allow use of cachefilesd to speed up file system operations

https://gerrit.wikimedia.org/r/201616

Removing the MediaWiki-ResourceLoader tag. I don't see an easy fix for this, and it does not make sense to optimize ResourceLoader for VirtualBox Shared Folders or NFS.

Krinkle assigned this task to ori.
Krinkle removed a project: Patch-For-Review.

This hasn't been an issue for a while afaik.

Related issues that contributed to this: