Page MenuHomePhabricator

Running translatewiki export for MediaWiki extensions: Too many concurrent connections (4) - max. allowed: 4
Closed, ResolvedPublic

Description

Running export script from translatewiki for MediaWiki extensions. Happens since Friday, May 3rd 2019:
This makes a complete export of MediaWiki extensions impossible

[Symfony\Component\Process\Exception\ProcessFailedException]
The command "/home/betawiki/config/bin/clupdate-gerrit-repo 'ssh://l10n-bot@gerrit.wikimedia.org:29418/mediawiki/extensions/AddThis' '/resources/raymond/mediawiki-extensions/AddThis' 'master' '973dc43d4d1ccd0717f34b0ad350dfee54a995f
a'" failed.
Exit Code: 1(General error)
Working directory: /resources/raymond
Output:
================
Error Output:
================
Received disconnect from 2620:0:861:3:208:80:154:85 port 29418:12: Too many concurrent connections (4) - max. allowed: 4
Connection to gerrit.wikimedia.org closed by remote host.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Event Timeline

Raymond created this task.May 5 2019, 8:20 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 5 2019, 8:20 PM
Reedy added a comment.May 5 2019, 9:04 PM

This makes a complete export of MediaWiki extensions impossible

Well, maybe as currently written, but it could be adjusted to follow the limits...

How many concurrent connections does it try to do?

Nikerabbit triaged this task as Unbreak Now! priority.May 6 2019, 6:00 AM

RepoNg defaults to number of cores (8 on translatewiki.net server). Same limit is used for anonymous pulls as well as commits over ssh.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMay 6 2019, 6:00 AM

Change 508169 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] Limit RepoNg parallelism to 4

https://gerrit.wikimedia.org/r/508169

Change 508169 merged by jenkins-bot:
[translatewiki@master] Limit RepoNg parallelism to 4

https://gerrit.wikimedia.org/r/508169

The above change is deployed. I would be curious to hear from @Raymond and @abi_ that 1) it works 2) how much it affects duration of exports.

The above change is deployed. I would be curious to hear from @Raymond and @abi_ that 1) it works 2) how much it affects duration of exports.

It works again. Thanks Niklas. But much slower. For extensions the export script runs this morning ~ 45 minutes. Previously ~ 15-20 minutes. I really would like to see if there could be an execemption for l10n-bot to increase to the previous value.

@hashar ^^?

There is one more thing I could do, which is to separate the limit for update, export and commit phases. export is CPU bound and doesn't make connections, unlike the other two. Still, the other commands likely have some overhead too, so I don't know if that is enough to restore the near-original duration.

The above change is deployed. I would be curious to hear from @Raymond and @abi_ that 1) it works 2) how much it affects duration of exports.

It works again. Thanks Niklas. But much slower. For extensions the export script runs this morning ~ 45 minutes. Previously ~ 15-20 minutes. I really would like to see if there could be an execemption for l10n-bot to increase to the previous value.

I don't think this is possible, at least, with how the Gerrit config is currently done, and would probably require changes upstream to support it.

However, there's potentially room for compromise; maxConnectionsPerUser was originally 32, dropped to 4, based on the numbers zuul used, as I guess it was deemed to be probably the biggest concurrent user. Bumping it upto 8 doesn't seem unreasonable, but would be upto Release-Engineering-Team and Operations to decide

Reedy lowered the priority of this task from Unbreak Now! to High.May 6 2019, 3:28 PM
abi_ added a comment.May 6 2019, 3:32 PM

I ran the exports today and they ran without a hitch but they did take a lot longer. They generally take 20-25 minutes, but ended up taking 50 minutes today.

@hashar ^^?
There is one more thing I could do, which is to separate the limit for update, export and commit phases. export is CPU bound and doesn't make connections, unlike the other two. Still, the other commands likely have some overhead too, so I don't know if that is enough to restore the near-original duration.

This was done in https://gerrit.wikimedia.org/r/c/translatewiki/+/509760 – again I am interested in reports about the duration.

@Nikerabbit Again getting "Received disconnect from 2620:0:861:3:208:80:154:85 port 29418:12: Too many concurrent connections (4) - max. allowed: 4"

That shouldn't happen unless there is some other process running at the same time. I was able to do a successful run with the patch when I tested it :/

@Nikerabbit Tested it this morning again. Happens again :-(

Change 510308 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] Fix broken parallelism limiting

https://gerrit.wikimedia.org/r/510308

Change 510308 merged by jenkins-bot:
[translatewiki@master] Fix broken parallelism limiting

https://gerrit.wikimedia.org/r/510308

This was done in https://gerrit.wikimedia.org/r/c/translatewiki/+/509760 – again I am interested in reports about the duration.

Script runs today for MediaWiki extension for~ 30 minutes.

I have missed this task sorry. The change is indeed intentional.

Something that might speed it up is to have the ssh client to keep the ssh connection established so it can be reused. That saves a few network round trips. It is a feature of the openssh client and can be configured via:

~/.ssh/config
Host gerrit.wikimedia.org
     ControlMaster=auto
     # File holding the connection
     # %C = local hostname + remote hostname + remote port + remote username
     ControlPath=~/.ssh/control-%C
     # Keep the connection around for X seconds:
     ControlPersist = 60

If need be, one can explicitly disconnect it at the end of the run by sending the exit control command:

ssh -p 29418 -O exit gerrit.wikimedia.org

You can also fetch over https instead of via ssh.

git remote set-url origin https://gerrit.wikimedia.org/mediawiki/extensions/Foo.git
git remote set-url --push origin ssh://gerrit.wikimedia.org:29418/mediawiki/core.git

And the gerrit.query calls can probably be converted to use the Gerrit REST API albeit that is a little more work.

This way you only get patchsets and submit being send over ssh with the connection being reused. All reads being done over https.

Thanks, I'll file follow-up tasks for these suggestions. I have faint memory I saw some corruption when trying to use persistent connections, but worth trying again.

Arrbee closed this task as Resolved.May 20 2019, 8:16 AM