Page MenuHomePhabricator

Phabricator needs to expose ssh
Closed, ResolvedPublic

Description

Phabricator is currently hosted on iridium.eqiad.wmnet, and is behind a double-layered proxy. The current setup is blocking two features that we want to enable in phabricator:

  1. git repo hosting (ssh port)
  2. real-time notifications (see: T112765 for that one)

Status as of October 20th 2015: git-ssh is now enabled for ipv4, all that remains is ipv6 support.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Can anyone comment on how websockets fit into the current plan? Does that need to get broken out into a separate task?

When we ask the question if something should be broken out into a separate thing, the answer is almost always yes, so i just created T112765 as a subtask. Then i renamed this ticket so it's only for the SSH part now.

Dzahn renamed this task from Phabricator needs to expose ssh and notification daemon (websocket) to Phabricator needs to expose ssh.Sep 16 2015, 4:03 PM

Basically, yeah. I ran down a similar plan with @chasemp and I think he's working on some patches for it.

@BBlack / @chasemp: Any clue on ETA on this? Patches (unlinked here) that we should help with? Phabricator exposing ssh is a blocker for a Q1 goal for RelEng (T128: [keyresult] Allow cloning of Phabricator hosted git repositories), which as we all know, then end of is an arbitrary point in time coming up in 1.5 weeks :).

Basically, yeah. I ran down a similar plan with @chasemp and I think he's working on some patches for it.

@BBlack / @chasemp: Any clue on ETA on this? Patches (unlinked here) that we should help with? Phabricator exposing ssh is a blocker for a Q1 goal for RelEng (T128: [keyresult] Allow cloning of Phabricator hosted git repositories), which as we all know, then end of is an arbitrary point in time coming up in 1.5 weeks :).

:) 1 week away

(updated description to just talk about exposing ssh, since we split off websockets, which we still want, plzkthxbai)

Change 242014 had a related patch set uploaded (by BBlack):
define conftool-data for cluster=phabricator,service=git-ssh

https://gerrit.wikimedia.org/r/242014

Change 242014 merged by BBlack:
define conftool-data for cluster=phabricator,service=git-ssh

https://gerrit.wikimedia.org/r/242014

Change 242020 had a related patch set uploaded (by BBlack):
Fix 'service' value in LVS hieradata for git-ssh

https://gerrit.wikimedia.org/r/242020

Change 242020 merged by BBlack:
Fix 'service' value in LVS hieradata for git-ssh

https://gerrit.wikimedia.org/r/242020

greg changed the task status from Stalled to Open.Sep 30 2015, 9:14 PM

We are working through this slowly. Brandon yesterday outlined our existing options:

bblack: the two best options to me, IMHO, are either 
(1) Live with a non-port-22 git-ssh service from a public POV, in which case all problems dissapear with the config we have today after a port change
or (2)
 We do the router firewall port 22 exception, switch LVS to configure primary sshd to listen on ipaddress_eth0 
only, switch iridium the same, define a git-ssh internal service IP as an extra eth0 IP on iridium and plug that into 
the LVS config where iridium's main IP is now, and configure phab-sshd to listen on that new internal service IP as
 well.
chasemp
understood, pursing 2 now, assuming it's all sorted here shortly we'll be happier for longer than we are annoyed
chasemp
I'm hopeful
bblack
hmmm wait, that's not right either, is it?
chasemp
what is it missing?
bblack
phab-sshd would be configured to listen on the new internal service IP *and* the lvs::realserver IP that LVS is routing into it.  and the internal service IP is functionally really only there to support the monitoring checking that phab-sshd is listening.

Currently need to coordinate on https://gerrit.wikimedia.org/r/#/c/243982/

This will probably sit through the offsite so I wanted to clarify status

@demon explained some of the historical difficulties in having SSH be on a non-standard port which relate to universities/corporate environments/controlled environments and difficulty getting people access to participate. There is a clear path forward (even if it is a bit of a pain) for pursuing an external port 22 ssh/git service.

@demon explained some of the historical difficulties in having SSH be on a non-standard port which relate to universities/corporate environments/controlled environments and difficulty getting people access to participate. There is a clear path forward (even if it is a bit of a pain) for pursuing an external port 22 ssh/git service.

See also T37611: Remove port 29418 from cloning process.

Our ferm module doesn't seem to allow specification of the dst address at this moment. In this case we will have multiple SSH services running on multiple IP's on one box. I need to sync with @mmoritz about whether he is fine with limiting SSH IP binding as a protection and opening up 22.

I notice the lack of IPv6 everywhere on this effort (no service IP assigned, no ACLs for it etc.). We've been building everything user-facing as IPv6-enabled from day 0 for a long time now, let's please do so here as well.

@demon explained some of the historical difficulties in having SSH be on a non-standard port which relate to universities/corporate environments/controlled environments and difficulty getting people access to participate. There is a clear path forward (even if it is a bit of a pain) for pursuing an external port 22 ssh/git service.

See also T37611: Remove port 29418 from cloning process.

Thanks for the link

I notice the lack of IPv6 everywhere on this effort (no service IP assigned, no ACLs for it etc.). We've been building everything user-facing as IPv6-enabled from day 0 for a long time now, let's please do so here as well.

That's the plan but I wanted to see IPv4 work end to end first

Status is:

git clone ssh://vcs@git-ssh.wikimedia.org/diffusion/TEST/testrepo.git

works

@chasemp: Is there anything remaining for this to be completed? Feel free to claim and close this task. :)

@chasemp: Is there anything remaining for this to be completed? Feel free to claim and close this task. :)

ipv6 :)

Change 255164 had a related patch set uploaded (by Rush):
phab: add IPv6 VCS address

https://gerrit.wikimedia.org/r/255164

Change 255164 merged by Rush:
phab: add IPv6 VCS real server IP

https://gerrit.wikimedia.org/r/255164

Change 255168 had a related patch set uploaded (by Rush):
phab: VCS ssh listen on IPV6

https://gerrit.wikimedia.org/r/255168

Change 255168 merged by Rush:
phab: VCS ssh listen on IPV6

https://gerrit.wikimedia.org/r/255168

git-ssh.wikimedia.org has an ipv6 address in DNS, however, it's not yet active due to lack of time to work on this. We need to kill the dns entry for ipv6 until we can get ssh working on ipv6 because this is causing extreme delays for anyone with proper ipv6 support on their client.

Change 265424 had a related patch set uploaded (by Dzahn):
comment IPv6 records for git-ssh

https://gerrit.wikimedia.org/r/265424

Change 265424 merged by Dzahn:
comment IPv6 records for git-ssh

https://gerrit.wikimedia.org/r/265424

git-ssh.wikimedia.org has an ipv6 address in DNS, however, it's not yet active due to lack of time to work on this. We need to kill the dns entry for ipv6 until we can get ssh working on ipv6 because this is causing extreme delays for anyone with proper ipv6 support on their client.

Done. I commented them., added a FIXME for later https://gerrit.wikimedia.org/r/#/c/265424/2/templates/wikimedia.org

@mmodell can you please fix IPv6 instead or explain why it is difficult to do so? FWIW, IPv6 penetration is > 10% globally and > 25% in the US alone. Wikis have been dual-stacked with first-class IPv6 support since World IPv6 Launch day (June 6, 2012); it's been 3½ years since and surely Phabricator can't be all that more difficult than this :)

@mmodell can you please fix IPv6 instead or explain why it is difficult to do so? FWIW, IPv6 penetration is > 10% globally and > 25% in the US alone. Wikis have been dual-stacked with first-class IPv6 support since World IPv6 Launch day (June 6, 2012); it's been 3½ years since and surely Phabricator can't be all that more difficult than this :)

There was talk last night about it being a firewall/ACL type issue, which, I think, partially at least makes it a netops issue?

Please feel free to fix that side of things and revert the IPv6 commenting out :P

@faidon: I don't have any idea how to fix ipv6. I have zero experience with the systems involved and I don't even have ipv6 working on my home network.

I'm definitely not the right person to work on this.

As far as I am aware, this is not a problem with phabricator. The difficulty is the complex network infrastructure in between phabricator and the internet. I have very limited understanding of how all the pieces fit together.

@faidon: I don't have any idea how to fix ipv6. I have zero experience with the systems involved and I don't even have ipv6 working on my home network.

I'm definitely not the right person to work on this.

Happy to help you get a sense on how to diagnose/fix this if you're interested, or happy to help find the right person if you're not.

In any case, please approach "X is broken and I don't know how to fix it" with "can someone fix X" (or "help me fix X"), not "let's disable X" :)

@faidon: I was only summarizing the discussion we (myself, @Reedy, @Dzahn and @chasemp) had in IRC. Please don't shoot the messenger ;)

In any case, please approach "X is broken and I don't know how to fix it" with "can someone fix X" (or "help me fix X"), not "let's disable X" :)

Well, ofc it's not exactly disabling something if it's not working ;). But I get your point.

It was the simplest fix at the time to stop cloning from phab taking 3 minutes of waiting for IPv6 timeout. When it was reported (not by me, but by someone else, but I could and had replicated it earlier in the day)

@Mutante said he didn't have the access to fix it, and with the time of day, that was the simplest temporary fix

The DNS IPv6 entry has been dropped yesterday because there is no ssh service listening there to serve the git repositories. That caused anyone having v6 to timeout while Connecting to git-ssh.wikimedia.org [2620:0:861:ed1a::3:16] port 22. It wasn't much of an issue until we recently started to point more folks to Diffusion hosted repositories.

From a quick discussion I had with @mmodell , iridium is on a fully isolated network and access from outside is fairly restricted.

Seems the routing is handled by a frontend LVS that has/would have a v6 service IP for git-ssh which would be an extra IP on iridium. Ssh needs to be listening on that service IP. I believe that configuration part and the service IP is 2620:0:861:ed1a::3:16.

Apparently there is some firewalls / ferm rules on the path that restrict port 22. And from @chasemp comment:

Our ferm module doesn't seem to allow specification of the dst address at this moment. In this case we will have multiple SSH services running on multiple IP's on one box. I need to sync with @MoritzMuehlenhoff about whether he is fine with limiting SSH IP binding as a protection and opening up 22.

So imho it boils down to:

  • verify ssh on iridum listen on the v6 service IP
  • validate with @MoritzMuehlenhoff the security model
  • add a ferm rule on the LVS frontend to allow port 22 but solely to the git-ssh service IP

Seems to me it is going to be netops driven. I don't mind pairing as needed since I get at least basic knowledge of LVS and some network / v6 knowledge.

I'm putting together 3x commits for review that I think will resolve this, they should show up below...

Change 265492 had a related patch set uploaded (by BBlack):
Add IPv6 for iridium-vcs.eqiad.wmnet

https://gerrit.wikimedia.org/r/265492

Change 265493 had a related patch set uploaded (by BBlack):
Add iridium-vcs.eqiad.wmnet ipv6 to phab puppetization

https://gerrit.wikimedia.org/r/265493

Change 265494 had a related patch set uploaded (by BBlack):
Add public IPv6 to git-ssh LVS IPs

https://gerrit.wikimedia.org/r/265494

I think those 3 and then uncommenting the public after it's deployed and tested should do the trick. Needs review!

Change 265493 merged by BBlack:
Add iridium-vcs.eqiad.wmnet ipv6 to phab puppetization

https://gerrit.wikimedia.org/r/265493

Change 265492 merged by BBlack:
Add IPv6 for iridium-vcs.eqiad.wmnet

https://gerrit.wikimedia.org/r/265492

Change 265555 had a related patch set uploaded (by BBlack):
Fix ipv6 ferm rule for iridium-vcs.eqiad.wmnet

https://gerrit.wikimedia.org/r/265555

Change 265555 merged by BBlack:
Fix ipv6 ferm rule for iridium-vcs.eqiad.wmnet

https://gerrit.wikimedia.org/r/265555

Change 265494 merged by BBlack:
Add public IPv6 to git-ssh LVS IPs

https://gerrit.wikimedia.org/r/265494

IPv6 is fixed:

[blblack@mysteron gdnsd]$ telnet git-ssh.wikimedia.org 22
Trying 2620:0:861:ed1a::3:16...
Connected to git-ssh.wikimedia.org.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4

+1 WFM! Thanks @BBlack! :D

time GIT_SSH_COMMAND="ssh -v" git clone ssh://vcs@git-ssh.wikimedia.org/diffusion/MSCA/scap.git

real 0m4.419s

+1 WFM too! Awesome, thanks all people involved!

time GIT_SSH_COMMAND="ssh -v" git clone ssh://vcs@git-ssh.wikimedia.org/diffusion/WMUI/wikimedia-ui.git
real 0m2.846s

Anything else needed here? Or is this complete now?

git clone works for me over v6 :-)

There is still one comment that I dont think is formally addressed:

Our ferm module doesn't seem to allow specification of the dst address at this moment. In this case we will have multiple SSH services running on multiple IP's on one box. I need to sync with @mmoritz about whether he is fine with limiting SSH IP binding as a protection and opening up 22.

The puppet ferm rule is:

ferm::rule { 'ssh_public':
    rule => 'saddr (0.0.0.0/0 ::/0) daddr (10.64.32.186/32 208.80.154.250/32 2620:0:861:103:10:64:32:186/128 2620:0:861:ed1a::3:16/128) proto tcp dport (22) ACCEPT;',
}

I am not sure about all the history here, but it seems to me it should only accept public ssh to the service IP. Not the others which would be whitelisted elsewhere and restricted solely to bastions.

I don't really understand that quoted comment, but the ferm rules do have destination addresses that work at this time, and the results in iptables look correctly-restrictive.

that comment is out dated