Page MenuHomePhabricator

Make sure phab can talk to gearman and nodepool instances can talk to phabricator
Closed, ResolvedPublic

Description

background

The plan for CI in a brave new world (with Gerrit-Migration to Differential) currently looks like this:

Changeset uploaded to Differentialrun Harbormaster build plancall gearman apinodepool (on Gallium)CI job runner (labs instance)clone the repo from phabricatorrun testsreport test results (via conduit)differential test status updated

Details

So we need to be sure that there are no firewall rules (or other network issues) blocking the following communication paths:

IridiumGallium:4730 (gearman)tested, works now
Nodepool(labs instances)Iridium:443 (conduit https)works from tested labs instances
Nodepool(labs instances)git-ssh.wikimedia.org:22 (git over ssh)ssh: connect to host git-ssh.wikimedia.org port 22: Network is unreachable

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

@hashar: I don't know how to test nodepool instances. Is there a way for me to get a shell inside of an instance?

@chasemp: connecting to gallium's gearman port (4730) does not work from iridium. I'm not sure what rule is blocking it though.

So I'm guessing that iridium -> gallium:4730 is probably fixable with a ferm: rule in puppet?
Not so sure about what is up with git-ssh from labs, however, we can probably just use https and sidestep that one.

@chasemp: connecting to gallium's gearman port (4730) does not work from iridium. I'm not sure what rule is blocking it though.

It's not so much a rule that is blocking it, it's the absence of a rule that is allowing it. The default is to drop unless it's allowed.

Connectings to 4730 are allowed from these:

ACCEPT tcp -- labnodepool1001.eqiad.wmnet anywhere tcp dpt:4730
ACCEPT tcp -- gallium.wikimedia.org anywhere tcp dpt:4730
ACCEPT tcp -- scandium.eqiad.wmnet anywhere tcp dpt:4730
ACCEPT tcp -- localhost anywhere tcp dpt:4730

Change 280706 had a related patch set uploaded (by Dzahn):
contint:firewall: let phabricator talk to gearman

https://gerrit.wikimedia.org/r/280706

Change 280706 merged by Dzahn:
contint:firewall: let phabricator talk to gearman

https://gerrit.wikimedia.org/r/280706

on gallium:

Notice: /Stage[main]/Contint::Firewall/Ferm::Service[gearman_from_phabricator]/File[/etc/ferm/conf.d/10_gearman_from_phabricator]/ensure: created

iptables -L now includes:

ACCEPT tcp -- iridium.eqiad.wmnet anywhere tcp dpt:4730

should work now

@hashar: I don't know how to test nodepool instances. Is there a way for me to get a shell inside of an instance?

There is no straightforward way unfortunately. The instances have a jenkins user with the private key being on labnodepool (and in Jenkins credential store).

So the lame way is to connect from the Nodepool server, mark an instance to not be deleted (hold status), ssh to it and then delete the instance.

$ ssh labnodepool1001.eqiad.wmnet
$ become-nodepool
$ nodepool list
$ nodepool hold <some instance id>
$ ssh jenkins@10.x.y.z

Will get you access as jenkins, no sudo available.

Then: nodepool delete <some instance id

Thanks @Dzahn for setting this up so quickly. I tested that and I was able to connect to gearman.

Afaik the 'talk to phabricator' portion here is relevant for git-ssh.wikimedia.org which is notable as it's a separate IP (and port).

a rule exists to block all ssh to public ipv4 addresses from labs things and it will need and exclusion inserted above it for this LVS ip only. Something like:

set firewall family inet filter labs-in4 term git_ssh from destination-address 208.80.154.250/32
set firewall family inet filter labs-in4 term git_ssh from destination-port ssh

New problem: Apparently jenkins can't access phabricator over ssh.

Jenkins execute the jobs on labs instances, so it is not surprising they can't reach Phabricator over ssh (firewalled). Can't we have the Jenkins job clone/fetch over https?

I'm sure we could hack the Jenkins job to use https but the staging url is passed as a parameter currently

Why is labs intentionally blocked from connecting to ssh? Is that to avoid people using labs instances as ssh proxies?

Why is labs intentionally blocked from connecting to ssh?

Can you be more specific about this? In the case of phabricator, I suspect that ssh is blocked /by the phab server/ basically everywhere except from production bastions. It should be, at least.

EDIT: Sorry, that was an unresearched drive-by comment. Chase thinks that port 22 is explictly blocked someplace in our network setup.

22 to only 208.80.154.250/32 as the service address for git-ssh should be allowed now. The general block otherwise remains unchanged for ssh from labs.

@mmodell this is to avoid as much as possible the ability to administer or authenticate to hosts in prod from labs afaik. We have a similar prohibit 22 inbound from the public where git-ssh is also the only exception.

mmodell reassigned this task from mmodell to chasemp.

Thanks @chasemp! It works now!

Credit belongs to both @chasemp and @Dzahn but I can't assign it to both ;)