Page MenuHomePhabricator

Unable to mirror repository from git.legoktm.com into diffusion
Open, Needs TriagePublic

Description

I host the source code for a Tool labs tool at https://git.legoktm.com/legoktm/contentcontributor. I wish to mirror it into diffusion at https://phabricator.wikimedia.org/diffusion/1946/.

I configured the proper URI, however it fails with a timeout error.

modules/phabricator/templates/system.gitconfig.erb
[http "https://github.com"]
    proxy = <%= @proxy %>

[http "https://gerrit.wikimedia.org"]
    proxy =

I think that's why. What's the process for adding other git hosts for mirroring? What other options can we try?

Event Timeline

Legoktm created this task.Aug 26 2016, 4:50 AM
Restricted Application added subscribers: TerraCodes, Aklapper. · View Herald TranscriptAug 26 2016, 4:50 AM

Change 306900 had a related patch set (by Paladox) published:
Add git.legoktm.com to system.gitconfig.erb for phabricator

https://gerrit.wikimedia.org/r/306900

bd808 moved this task from Backlog to Needs Discussion on the Striker board.Aug 26 2016, 11:04 PM

The meta question to discuss is which of the following resolutions is best:

  • Set http.proxy globally for Phabricator's git demon to allow any external host to be mirrored.
  • Establish a clear and light weight policy for requesting and approving new external mirrors.
  • Determine that mirroring external repos into Diffusion via automatic pull is undesirable and instead only allow git push mirroring via trusted user accounts (which we have no way of stopping).

This probably pushes the answer to this question into the realms of both the Operations and Security-Team teams for input.

One more possibility that sidesteps the issue:

  • push-mirroring the repo to github and then pull-mirroring from there to phabricator

One more possibility that sidesteps the issue:

  • push-mirroring the repo to github and then pull-mirroring from there to phabricator

We don't want to force our users to use GitHub, as several GitHub terms and conditions are problematic for some of our users, for reasons exposed most recently at T119908.

We don't want to force our users to use GitHub, as several GitHub terms and conditions are problematic for some of our users, for reasons exposed most recently at T119908.

They aren't forced to use github, they can host their repository directly on phabricator and push from elsewhere as mentioned in @bd808's 3rd bullet point above. The ability to pull from arbitrary hosts isn't required in order to use our git hosting.

That said, I'm not against allowing other hosts, in fact I'm all for it, as long as Security-Team doesn't have a problem with it.

bd808 added a comment.Aug 28 2016, 3:30 AM

One more possibility that sidesteps the issue:

  • push-mirroring the repo to github and then pull-mirroring from there to phabricator

If we decide on only allowing push mirroring from non-GitHub origins (or a limited set of white-listed hosts), it will be pretty trivial to create a service on Labs/Tool Labs that manages a poll+pull+push workflow to mirror from anywhere.

One more possibility that sidesteps the issue:

  • push-mirroring the repo to github and then pull-mirroring from there to phabricator

If we decide on only allowing push mirroring from non-GitHub origins (or a limited set of white-listed hosts), it will be pretty trivial to create a service on Labs/Tool Labs that manages a poll+pull+push workflow to mirror from anywhere.

Would that have any security advantage compared to just allowing repositories to pull directly from anywhere? Poll+pull+push seems rather convoluted and unnecessary to me.

Would that have any security advantage compared to just allowing repositories to pull directly from anywhere? Poll+pull+push seems rather convoluted and unnecessary to me.

It would isolate the production Phabricator instance from some currently unknown zero-day exploit in git fetch actions. Other than that, I can't think of any advantage. I would certainly rather just have the simple puppet change that is the equivalent of git config --global http.proxy <%= @proxy %> and not have to deal with maintaining a poll+pull+push service.

@dpatrick and @faidon: do you have strong feelings about allowing Phabricator to directly replicate arbitrary 3rd party git repos for Tool maintainers?

bd808 added a comment.Sep 7 2016, 3:57 PM

@dpatrick and @faidon: do you have strong feelings about allowing Phabricator to directly replicate arbitrary 3rd party git repos for Tool maintainers?

Ping @dpatrick and @faidon for Security-Team and Operations input on opening up the git configuration for Phabricator to allow mirroring arbitrary external git repos.

Would this fall under security-reviews?

I don't have any strong feelings towards either direction, no. (let's see if Moritz or Darian feel otherwise)

As you've correctly identified it @bd808, the risk here is that git has a vulnerability in its client, that can be exploited via a crafted response from a git server. To have an advantage over the current status quo, it would still have to be valid HTTP (and thus pass through the Squid proxy) and it must not be in e.g. the pack parsing code, as that could be just as easily exploited with a git push rather than a pull. With the caveat that I don't know git's code at all, I'd estimate that the chances of that are slim, but not zero. I think it's a risk we could take, though.

Under which account do those git fetches run, and what other privileges does that account have? Would it be possible to have a dedicated account to run those fetches and/or wrap it under a jail?

Under which account do those git fetches run, and what other privileges does that account have? Would it be possible to have a dedicated account to run those fetches and/or wrap it under a jail?

It's the phd account and it has access to all the phabricator git repositories but not much else. I suppose we could jail it a bit further by running phd repository operations in a chroot or container? Seems a bit complex to set up.

greg moved this task from To Triage to Misc on the Phabricator board.Sep 26 2016, 10:19 PM
dpatrick added a comment.EditedOct 14 2016, 9:49 PM

In addition to @faidon's concerns above about client exploitation by a malicious server, I'm wondering how we might mitigate/if we should be concerned with:

  • DoS via storage exhaustion (large repo being mirrored, not necessarily maliciously);
  • DoS by tying up processes with slow writes from the server being mirrored;
  • Using the git client to make unauthorized requests to a third-party server.

(I realize that the latter requires that the target server respond at one of http://example.com/some/path/.git, http://example.com/some/path.git/.git, http://example.com/some/path.git, http://example.com/some/path.bundle, etc.)

I'm asking because I don't have a complete understanding of how this service is configured.

And, to be clear, I think it'd be cool to just open this up and allow mirroring from any host. I just have some questions.

  • DoS via storage exhaustion (large repo being mirrored, not necessarily maliciously);

Seems easy enough to handle manually, but maybe I'm missing something. If disk utilization spikes on the git server we look to see what is eating the space and disable/delete as needed.

  • DoS by tying up processes with slow writes from the server being mirrored;

The impact of this likely depends on how the phd process actually handles the mirror fetches. It again however seems like something that could be noticed and managed with a bit manual intervention. If the replication queue is lagging we can check job runtimes (I hope) and determine the culprit.

  • Using the git client to make unauthorized requests to a third-party server.

(I realize that the latter requires that the target server respond at one of http://example.com/some/path/.git, http://example.com/some/path.git/.git, http://example.com/some/path.git, http://example.com/some/path.bundle, etc.)

Naively I don't think this is a huge threat, but to some extent that depends on how phd handles replication errors. I think the worst thing that could happen is spamming the remote with GET /info/refs?service=git-upload-pack HTTP/1.1 requests.

And, to be clear, I think it'd be cool to just open this up and allow mirroring from any host. I just have some questions.

Awesome. So it sounds like that we are generally in agreement that allowing open mirroring can happen. We just need to have some better idea if we can enforce process separation and have some monitoring for resource exhaustion issues.

  • Using the git client to make unauthorized requests to a third-party server.

(I realize that the latter requires that the target server respond at one of http://example.com/some/path/.git, http://example.com/some/path.git/.git, http://example.com/some/path.git, http://example.com/some/path.bundle, etc.)

Naively I don't think this is a huge threat, but to some extent that depends on how phd handles replication errors. I think the worst thing that could happen is spamming the remote with GET /info/refs?service=git-upload-pack HTTP/1.1 requests.

I was thinking that the worst that can happen depends on whether the target server's GET handling is non-idempotent, but this problem is not unique to this particular application (Citoid comes to mind) so I'm willing to accept this request.

And, to be clear, I think it'd be cool to just open this up and allow mirroring from any host. I just have some questions.

Awesome. So it sounds like that we are generally in agreement that allowing open mirroring can happen. We just need to have some better idea if we can enforce process separation and have some monitoring for resource exhaustion issues.

Yep, I concur.

Does in some case phd generate an email to some users on import? (ie. as a DoS to their mailbox through a repository import)

I don't think that phabricator generates email related to git activity.

mmodell added a comment.EditedDec 4 2016, 2:59 AM

T146055 improves the situation slightly by isolating the db credentials which are readable by the phd user from the credentials used by other parts of phabricator. We can now limit which databases/tables are readable by the phd process, at least in theory. I say in theory only because I haven't determined which tables are actually needed and which can be locked down. Once that's done I'd just need help from a dba to get the privileges set in mysql appropriately.

Naively I don't think this is a huge threat, but to some extent that depends on how phd handles replication errors. I think the worst thing that could happen is spamming the remote with GET info/refs?service=git-upload-pack HTTP/1.1 requests.

Phabricator does retry on a fast schedule - it seems to retry replication errors every few seconds indefinitely.

We could probably improve that with an exponential backoff and perhaps striker could be configured to test the endpoint before adding it? That way we can at least avoid the initial configuration being obviously broken. We should also have a mechanism in place to detect failing remotes and clean them up if they continue to fail for a long time, though exponential backoff might be enough on it's own.

Change 306900 abandoned by Paladox:
phabricator: allow mirroring from git.legoktm.com into Diffusion

https://gerrit.wikimedia.org/r/306900