The job https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ takes a while, notably when generating the cache_git_info:
00:00:08.393 19:39:20 Started cache_git_info 00:12:20.924 19:51:33 Finished cache_git_info (duration: 12m 12s)
12 minutes :(
| hashar | |
| Sep 18 2018, 8:02 PM |
| F28902832: build-time-trend-beta-scap-eqiad-2019-05-02.png | |
| May 2 2019, 3:18 PM |
The job https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ takes a while, notably when generating the cache_git_info:
00:00:08.393 19:39:20 Started cache_git_info 00:12:20.924 19:51:33 Finished cache_git_info (duration: 12m 12s)
12 minutes :(
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| beta: limit scap to 512 max file descriptors | integration/config | master | +6 -0 |
Takes a while on beta because of all the extensions (plus disks are slower than in production where it takes like 20 seconds). IIRC we haven't done much to parallelize any of this, it serially walks the extensions directory. There are probably some easy wins here, but, again, it's been a beta-only problem so it hasn't been a high priority.
I gave it a quick try on deployment-deploy01 reproducing scap.tasks.cache_git_info(). Namely getting extensions and skins git information with scap.git.info and writing the json to a dummy file take roughly 40 seconds:
#!/usr/bin/python import json from scap.git import info as git_info from scap.utils import iterate_subdirectories for kind in ['extensions', 'skins']: for stuff in iterate_subdirectories('/srv/mediawiki-staging/php-master/%s' % kind): print('Stuff: %s' % stuff) info = git_info(stuff) with open('/tmp/foo.json', 'w') as f: json.dump(info, f)
$ time python scap_git.py real 0m40.602s user 0m13.600s sys 0m26.824s
I ended up manually hacking scap on deployment-deploy01 https://integration.wikimedia.org/ci/job/beta-scap-eqiad/222901/console:
00:00:06.979 09:11:21 cache_git_info( version=master ) wikidb is dewiktionary 00:00:07.705 09:11:22 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info.json 00:00:08.447 09:11:23 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-WikibaseQualityExternalValidation.json 00:00:09.177 09:11:23 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-SubpageFun.json 00:00:09.934 09:11:24 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-Ads.json 00:00:10.697 09:11:25 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-HierarchyBuilder.json 00:00:11.434 09:11:26 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-FixedHeaderTable.json ... 00:12:30.765 09:23:45 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-skins-Slate.json 00:12:31.562 09:23:46 Writing cache file /srv/mediawiki-staging/php-master/cache/gitinfo/info-skins-Refreshed.json 00:12:31.565 09:23:46 Finished cache_git_info (duration: 12m 24s)
Using iwatch shows that is quite slow:
[24/Sep/2018 09:17:51] IN_MODIFY ./info-extensions-ExtensionDistributor.json [24/Sep/2018 09:17:51] IN_OPEN ./info-extensions-ExtensionDistributor.json [24/Sep/2018 09:17:51] IN_MODIFY ./info-extensions-ExtensionDistributor.json [24/Sep/2018 09:17:51] IN_CLOSE_WRITE ./info-extensions-ExtensionDistributor.json [24/Sep/2018 09:17:51] * ./info-extensions-ExtensionDistributor.json is closed [24/Sep/2018 09:17:52] IN_MODIFY ./info-extensions-WikiLabels.json [24/Sep/2018 09:17:52] IN_OPEN ./info-extensions-WikiLabels.json [24/Sep/2018 09:17:52] IN_MODIFY ./info-extensions-WikiLabels.json [24/Sep/2018 09:17:52] IN_CLOSE_WRITE ./info-extensions-WikiLabels.json [24/Sep/2018 09:17:52] * ./info-extensions-WikiLabels.json is closed [24/Sep/2018 09:17:53] IN_MODIFY ./info-extensions-SacredText.json [24/Sep/2018 09:17:53] IN_OPEN ./info-extensions-SacredText.json [24/Sep/2018 09:17:53] IN_MODIFY ./info-extensions-SacredText.json [24/Sep/2018 09:17:53] IN_CLOSE_WRITE ./info-extensions-SacredText.json [24/Sep/2018 09:17:53] * ./info-extensions-SacredText.json is closed [24/Sep/2018 09:17:53] IN_MODIFY ./info-extensions-BoilerPlate.json [24/Sep/2018 09:17:53] IN_OPEN ./info-extensions-BoilerPlate.json [24/Sep/2018 09:17:53] IN_MODIFY ./info-extensions-BoilerPlate.json [24/Sep/2018 09:17:53] IN_CLOSE_WRITE ./info-extensions-BoilerPlate.json [24/Sep/2018 09:17:53] * ./info-extensions-BoilerPlate.json is closed
Then strace shows a spam of close(x) = -1 EBADF (Bad file descriptor), that might be a reason for the slow down. That also happens with my dummy scap_git.py script though it is still WAY faster.
Maybe that is related to writes to /srv/mediawiki-staging/php-master/cache/gitinfo being slow.
Mentioned in SAL (#wikimedia-releng) [2018-09-24T09:24:08Z] <hashar> Live hacked scap code on deployment-deploy01 for T204762 and reverted hack changes
Mentioned in SAL (#wikimedia-releng) [2018-09-24T09:51:48Z] <hashar> deployment-deploy01 : backed up /srv/mediawiki-staging/php-master/cache/gitinfo and created a new. Its size of 69632 bytes might cause slow writes?? | T204762
The spam of close(x) = -1 EBADF (Bad file descriptor) comes from a call to os.closerange
1679 class OProc(object): 1691 def __init__(self, command, parent_log, cmd, stdin, stdout, stderr, 1862 self.pid = os.fork() 1863 1864 # child 1865 if self.pid == 0: # pragma: no cover 1955 # don't inherit file descriptors 1956 max_fd = resource.getrlimit(resource.RLIMIT_NOFILE)[0] 1957 os.closerange(3, max_fd)
That is https://amoffat.github.io/sh/ which looks like syntaxic sugar to run shell commands so one can sh.git.show("HEAD"). It seems to reimplement subprocess methods, which is probably not a good idea :] But that is orthogonal to the disk slowness.
Even funnier, I wanted to trace the execution of scap sync using the python trace module. The git caching took less than a minute!
$ ssh deployment-deploy01.deployment-prep.eqiad.wmflabs $ sudo su - jenkins-deploy $ cd /srv/mediawiki-staging $ scap sync 10:44:34 Started scap: (no justification provided) ... 10:44:40 Started cache_git_info 10:45:35 Finished cache_git_info (duration: 00m 54s) ... 10:47:00 Finished scap: (no justification provided) (duration: 02m 26s)
I wanted some historical build durations. I took the IRC logs from https://wm-bot.wmflabs.org/logs/%23wikimedia-releng/ and looked for the irc notification when beta-scap-eqiad is fixed.
| Date | Build # | Duration |
|---|---|---|
| 20180701 03:49:13 | #214047 | 5 min 18 sec |
| 20180704 05:37:09 | #214503 | 4 min 13 sec |
| 20180705 11:17:56 | #214593 | 4 min 8 sec |
| 20180707 03:56:23 | #214858 | 12 min |
| 20180717 23:29:25 | #216392 | 47 min |
| 20180719 21:28:19 | #216629 | 4 min 30 sec |
| 20180723 02:58:01 | #217108 | 4 min 18 sec |
| 20180824 22:43:06 | #220162 | 48 min |
| 20180827 21:44:55 | #220424 | 31 min |
| 20180828 21:13:24 | #220501 | 29 min |
| 20180829 12:42:51 | #220555 | 15 min |
| 20180830 22:49:50 | #220677 | 16 min |
| 20180902 13:33:44 | #220907 | 15 min |
| 20180903 08:15:54 | #220979 | 14 min |
| 20180904 03:39:51 | #221051 | 16 min |
| 20180904 17:43:15 | #221102 | 15 min |
| 20180905 17:10:59 | #221179 | 17 min |
| 20180906 08:21:04 | #221229 | 15 min |
| 20180907 21:19:51 | #221362 | 49 min |
| 20180913 14:38:40 | #221839 | 13 min |
| 20180914 11:51:19 | #221925 | 14 min |
| 20180914 14:38:12 | #221938 | 14 min |
| 20180914 21:36:33 | #221966 | 14 min |
| 20180916 17:55:32 | #222151 | 13 min |
| 20180916 22:26:51 | #222169 | 16 min |
| 20180920 01:47:10 | #222474 | 13 min |
| 20180920 11:17:15 | #222513 | 13 min |
| 20180921 01:39:15 | #222570 | 13 min |
| 20180922 18:21:48 | #222737 | 13 min |
| 20180923 15:36:11 | #222825 | 13 min |
| 20180924 02:47:59 | #222872 | 14 min |
| 20180924 08:39:05 | #222897 | 15 min |
| 20180924 16:37:12 | #222932 | 13 min |
| 20180925 06:38:30 | #222988 | 13 min |
| 20180925 10:57:25 | #223006 | 15 min |
| 20180925 11:16:23 | #223008 | 13 min |
There are holes in the data set, but July 23th 2018 and they are definitely slow since at least August 30th.
If that is from scap, that might be one of:
f7cb839114333a5b00ed99f44f13fbfb64157857 from August 28th already notes that cache_git_info is slow on beta.
Or (most probably) it is entirely unrelated to scap itself.
This is still the slowest step of beta-scap-eqiad. Taking 9 minutes of the total 10 minutes and 5 seconds at https://integration.wikimedia.org/ci/job/beta-scap-eqiad/246396/console, for example.
It is also the reason beta deploys don't happen at the scheduled interval because the parent job (beta-code-update-eqiad) waits, correctly, for the this child job to finish. With the parent and child job setup/teardown included its about 16-20 minutes, whereas the job is scheduled for every 10 minutes.
Still pretty quick, but those 10 extra minutes every time do add up over the day when QA is waiting for it to confirm fixes or react to on-going issues during development. It's long enough to be affecting the work flow, but short enough that you can't realistically do something useful with it. Oh well :)
So parallelising it might speed things up (though if the whole thing is just slow disk access there might not be much to win here?)
However, in beta we have *all* extensions that exist in gerrit cloned. Is there a way of making it only have the extensions used in prod? I know extension-list-labs went away almost a year ago so I'm not sure we need anything extra.
Edit: Ugh, I'm guessing we currently rely on the gerrit auto-submodule feature for each branch, whereas we'd like to track only prod's subset of extensions - just using the master branches... Maybe we can avoid initialising all of them or something.
To the best I could tell when I investigated it:
Eg scap sync with a sub minute cache_git_info:
I hacked scap to output some logging info to better show the cache_git_info progress https://integration.wikimedia.org/ci/job/beta-scap-eqiad/246645/console
I also tried again directly on the machine using:
sudo su - jenkins-deploy cd /srv/mediawiki-staging/ scap sync
And it takes 33 seconds:
17:51:54 Started cache_git_info 17:52:28 Finished cache_git_info (duration: 00m 33s)
And I don't think it is due to a potentially cold disk cache, a jenkins build occurring immediately after is still super slow.
Or in short, the issue only occurs when being harnessed/wrapped up with the Jenkins job.
@hashar Perhaps an ENV variable is interfering? Try with sudo -i instead of sudo -s and maybe also check which environment variables that Jenkins sets when establishing the SSH connection before the job/scap command starts.
Both are valids (parallelism and less repositories) though they are workarounds and I would rather track those as other tasks ;)
I went with a very basic script
#!/usr/bin/python import json import os import os.path import scap.log as log from scap.utils import iterate_subdirectories from scap.sh.contrib import git import logging logging.basicConfig(level=logging.DEBUG) with log.Timer('scap_git infos'): for kind in ['extensions', 'skins']: for stuff in iterate_subdirectories('/srv/mediawiki-staging/php-master/%s' % kind): os.chdir(stuff) with log.Timer(stuff): print('Url: %s' % git('ls-remote', '--get-url').strip())
Took more than 4 minutes on https://integration.wikimedia.org/ci/job/beta-hashar-cache-git-info/18/ . Each calls to git ls-remote --get-url takes roughly 250 ms. An example:
00:04:00.531 INFO:timer:Started /srv/mediawiki-staging/php-master/skins/DeepSea 00:04:00.531 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url'>: starting process 00:04:00.532 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: started process 00:04:00.532 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 7406>: process started 00:04:00.533 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: acquiring wait lock to wait for completion 00:04:00.533 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: got wait lock 00:04:00.534 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid 00:04:00.775 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3a48078ad0> ready to be read from 00:04:00.776 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 61: 'https://gerrit.wikimedia.org/r' 00:04:00.777 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock to process chunk (buffering: 1) 00:04:00.777 DEBUG:scap.sh.stream_bufferer:got buffering lock to process chunk (buffering: 1) 00:04:00.778 DEBUG:scap.sh.stream_bufferer:released buffering lock for processing chunk (buffering: 1) 00:04:00.778 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: putting chunk onto pipe: 'https://gerrit.wikimedia.org/r' 00:04:00.779 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3a48078ad0> ready to be read from 00:04:00.780 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got no chunk, done reading 00:04:00.780 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3a48081410> ready to be read from 00:04:00.781 DEBUG:scap.sh.streamreader:got no chunk, done reading 00:04:00.781 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock for flushing buffer 00:04:00.782 DEBUG:scap.sh.stream_bufferer:got buffering lock for flushing buffer 00:04:00.782 DEBUG:scap.sh.stream_bufferer:released buffering lock for flushing buffer 00:04:00.782 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 0 to flush: '' 00:04:00.783 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock for flushing buffer 00:04:00.783 DEBUG:scap.sh.stream_bufferer:got buffering lock for flushing buffer 00:04:00.784 DEBUG:scap.sh.stream_bufferer:released buffering lock for flushing buffer 00:04:00.784 DEBUG:scap.sh.streamreader:got chunk size 0 to flush: '' 00:04:00.784 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 7406>: process completed 00:04:00.785 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 7406>: process completed 00:04:00.785 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 7406>: process completed 00:04:00.785 Url: https://gerrit.wikimedia.org/r/p/mediawiki/skins/DeepSea.git 00:04:00.786 INFO:timer:Finished /srv/mediawiki-staging/php-master/skins/DeepSea (duration: 0.253926992416)
Notably:
00:04:00.532 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: started process 00:04:00.532 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 7406>: process started 00:04:00.533 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: acquiring wait lock to wait for completion 00:04:00.533 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: got wait lock 00:04:00.534 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid 00:04:00.775 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3a48078ad0> ready to be read from 00:04:00.776 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 7406 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 61: 'https://gerrit.wikimedia.org/r'
The same script being run from manually is dramatically faster to less than 8 seconds (from 4 minutes):
7.639760 INFO:timer:Started /srv/mediawiki-staging/php-master/skins/DeepSea 7.639984 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url'>: starting process 7.643368 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: started process 7.644402 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 29648>: process started 7.644476 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: acquiring wait lock to wait for completion 7.644533 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: got wait lock 7.644583 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid 7.645534 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3454020550> ready to be read from 7.645644 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 61: 'https://gerrit.wikimedia.org/r' 7.645968 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock to process chunk (buffering: 1) 7.646060 DEBUG:scap.sh.stream_bufferer:got buffering lock to process chunk (buffering: 1) 7.646284 DEBUG:scap.sh.stream_bufferer:released buffering lock for processing chunk (buffering: 1) 7.646343 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: putting chunk onto pipe: 'https://gerrit.wikimedia.org/r' 7.646426 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3454020550> ready to be read from 7.646482 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got no chunk, done reading 7.646538 DEBUG:scap.sh.command.process:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3454088050> ready to be read from 7.646587 DEBUG:scap.sh.streamreader:got no chunk, done reading 7.646641 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock for flushing buffer 7.646686 DEBUG:scap.sh.stream_bufferer:got buffering lock for flushing buffer 7.646733 DEBUG:scap.sh.stream_bufferer:released buffering lock for flushing buffer 7.646786 DEBUG:scap.sh.command.process.streamreader:<Command u'/usr/bin/git ls-remote --get-url'>.<Process 29648 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 0 to flush: '' 7.646849 DEBUG:scap.sh.stream_bufferer:acquiring buffering lock for flushing buffer 7.646893 DEBUG:scap.sh.stream_bufferer:got buffering lock for flushing buffer 7.646939 DEBUG:scap.sh.stream_bufferer:released buffering lock for flushing buffer 7.646984 DEBUG:scap.sh.streamreader:got chunk size 0 to flush: '' 7.647147 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 29648>: process completed 7.647233 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 29648>: process completed 7.647291 INFO:scap.sh.command:<Command u'/usr/bin/git ls-remote --get-url', pid 29648>: process completed 7.647365 Url: https://gerrit.wikimedia.org/r/p/mediawiki/skins/DeepSea.git 7.647504 INFO:timer:Finished /srv/mediawiki-staging/php-master/skins/DeepSea (duration: 0.00769400596619)
The issue lays somewhere into scap/sh.py :/ One would notice the slowdown in log messages:
00:04:00.534 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid 00:04:00.775 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f3a48078ad0> ready to be read from
Which would be scap.sh output_thread() function:
poller = Poller()
if stdout is not None:
poller.register_read(stdout)
while poller:
changed = no_interrupt(poller.poll, 0.1)And that matches a bunch of poll() system calls happening when I ran the script with strace :-/
When running from Jenkins, the script is slightly slower for some reason and causes scap to wait with that 100ms poll.
I don't think there is any proper fix to it short of hunting down the cause of slowdown or fixing scap.sh. Most probably we should just switch to use python subprocess and get rid of all that custom logic. **Or it is an artfact of git being slow for some other reason :-\
Back in September I have noticed a spam of close() calls ( T204762#4610309 ). Before executing the process scap/sh closes all file descriptors (beside stdin/out/err) from the parent using:
max_fd = resource.getrlimit(resource.RLIMIT_NOFILE)[0] os.closerange(3, max_fd)
Running from the command line I get 1024 maximum file desriptors which are fairly fast to close:
$ python closefds.py |ts -s %.s 0.000029 Max fd: 1024 0.000823 Closing them all 0.000849 Done closing fds
The same script when run in Jenkins:
00:00:00.031 + python -u closefds.py 00:00:00.045 Max fd: 1048576 00:00:00.045 Closing them all 00:00:00.289 Done closing fds 00:00:00.365 Finished: SUCCESS
So when run under Jenkins, there is up to 1 million file descriptors available and closing them all takes ~ 240 miliseconds.
ulimit -n does yield 1048576 under Jenkins.
Change 507773 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] beta: limit scap to 512 max file descriptors
Change 507773 merged by jenkins-bot:
[integration/config@master] beta: limit scap to 512 max file descriptors
T222372 would be able dropping scap/sh.git entirely, but I don't think it will be much of a priority :]