Page MenuHomePhabricator

Experiment with persistent connections in auto(import|export)
Closed, ResolvedPublic

Description

Something that might speed it up is to have the ssh client to keep the ssh connection established so it can be reused. That saves a few network round trips. It is a feature of the openssh client and can be configured via:

~/.ssh/config
Host gerrit.wikimedia.org
     ControlMaster=auto
     # File holding the connection
     # %C = local hostname + remote hostname + remote port + remote username
     ControlPath=~/.ssh/control-%C
     # Keep the connection around for X seconds:
     ControlPersist = 60

If need be, one can explicitly disconnect it at the end of the run by sending the exit control command:

ssh -p 29418 -O exit gerrit.wikimedia.org

This is complicated by the fact that this should happen automatically without modifying the users' config. One option could be to set GIT_SSH_COMMAND. Or perhaps one could edit the PATH environment variable to inject a wrapper. The goal is to pass -F to define a custom config file like above.

Should also explore whether to place this in auto(import|export), l10n-bot or a new wrapper.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2019, 8:38 AM

Experimentation with persistent connections.

I modified autoexport (autoexport-mediawiki is symlink to it) to only do updates:

(
	GIT_SSH_COMMAND="ssh -F '$DIRSCRIPT/../repong/ssh'"
	export GIT_SSH_COMMAND

	for i in $PROJECTS; do
		echo "${_b}$i${b_}"
		{
			"$DIRSCRIPT/l10n-bot" "$DIRSCRIPT/repoupdate" "$i"
		} || :
	done
)

The config file is:

Host *
     ControlMaster=auto
     # File holding the connection
     # %C = local hostname + remote hostname + remote port + remote username
     ControlPath=~/.ssh/l10n-bot-%C
     # Keep the connection around for X seconds:
     ControlPersist = 5

Measurements:

# non-persistent, second run after warm-up
twn:/resources/nike$ time ./translatewiki/bin/autoexport-mediawiki 
mediawiki
mediawiki-extensions
mediawiki-skins

real    5m11.695s
user    0m16.552s
sys     0m14.128s

# persistent, first run
twn:/resources/nike$ time ./translatewiki/bin/autoexport-mediawiki 
mediawiki
mediawiki-extensions
mediawiki-skins

real    1m50.120s
user    0m8.660s
sys     0m9.156s

# persistent, second run
twn:/resources/nike$ time ./translatewiki/bin/autoexport-mediawiki 
mediawiki
mediawiki-extensions
mediawiki-skins

real    1m46.806s
user    0m9.244s
sys     0m9.572s

# non-persistent, second run
time ./translatewiki/bin/autoexport-mediawiki 
mediawiki
mediawiki-extensions
mediawiki-skins

real    4m33.096s
user    0m13.364s
sys     0m12.704s

For comparison here are update times over https:

twn:/resources/projects$ time php /home/betawiki/config/repong/repong.php list | grep ^mediawiki | b xargs -n1 repoupdate

real    2m46.367s
user    1m1.524s
sys     0m25.108s
twn:/resources/projects$ time php /home/betawiki/config/repong/repong.php list | grep ^mediawiki | b xargs -n1 repoupdate

real    2m16.707s
user    1m2.256s
sys     0m25.304s

So ssh with persistent connections can even be faster than https, so maybe T223368: Consider fetching updates over https is not worth it.

One thing to not is that this is best case scenario: hundreds of repositories all using the same host. In practice the export script iterates over update/export/commit for each project. Unless the timeout is made very long [1], the master connection will be closed between update and commit phases. In addition for non-mediawiki, there are more varied hosts, github, gerrit and others. I also haven't tested whether submitting patches over git-review uses the shared connection.

[1] If it is made long, we need explicit closing. That's not super trivial do though as I don't want to hardcode the hosts. Iterating over the sockets and closing them directly with ssh or through fuser pid could work.

Change 510973 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] Re-use ssh connections in autoexport(-mediawiki)

https://gerrit.wikimedia.org/r/510973

Change 510973 merged by jenkins-bot:
[translatewiki@master] Re-use ssh connections in autoexport(-mediawiki)

https://gerrit.wikimedia.org/r/510973

Nikerabbit closed this task as Resolved.May 20 2019, 10:13 AM
Nikerabbit claimed this task.
Nikerabbit removed a project: Patch-For-Review.