Page MenuHomePhabricator

Figure out a git hosting solution for tools/kubernetes
Closed, ResolvedPublic

Description

Vague ticket that needs lots more editing and details! Please hold the flames!

We need a nice git hosting solution for tools. Right now a lot of people use github/bitbucket, and a small minority use gerrit. Some of the reasons:

  1. Gerrit's UI
  2. Repo creation is not self serve, and causes a block!
  3. Most people don't want pre-merge review since it's just themselves writing teh code, and gerrit's 'just push' mode isn't very well advertised
  4. There's just no advantage to using Gerrit for tools at all (unlike for MW extensions)

This is totally ok - I'm personally fine with people using github / bitbucket / whatever (as long as they are using some form of source control Yuvi is happy). But if we're going to build docker containers off git repos (more details on that forthcoming but let's assume we want to do that) we need a local git repo anyway, and it'll be nice to make that be a hosted setup that people can use too.

Things that a geared-towards-tools git setup would need:

  1. Be totally self serve. People should be able to create repos without having to go through a request process
  2. Possible to be mirrored in either direction (from / to Github / BitBucket) so people can still use those if they want while our stuff can just pretend they're all using this setup
  3. Have a web UI for people to browse code / search
  4. Direct git access (no going through an external learning curve/tool)
  5. Allows other people to contribute to other people's code in some way or other (aka a method for people to send easy patches, no enforcing code review)
  6. No enforced code review
  7. Does not have its own User system (should be just LDAP at worst)
  8. Supports *blocking* post-receive hooks (eg. Make a HTTP call / run a command on post receive, and send output of that to the pusher. Same behavior as a literal post receive hook in git)

Related Objects

StatusSubtypeAssignedTask
ResolvedLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
Opendcaro
Resolvedyuvipanda
Resolveddcaro
Resolvedbd808
Resolvedbd808
Resolved mmodell
Resolved mmodell
Resolvedbd808
Resolved dpatrick
Resolvedbd808
Resolved mmodell
Resolvedjcrespo
Resolvedbd808
Resolvedbd808
Resolvedbd808
Resolvedbd808

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda subscribed.

Additionally, it would be nice if said service did not require a new account system and could be hooked up to LDAP. Bonus points if you can grant repo access to all members of a tool, and when new members of a tool are added, it updates in the git service too. (I might have filed a FR to do this in Gerrit at some point)

The Wikimedia git hosting solution is Gerrit.

The aim is to replace Gerrit with Differential, a solution which is embedded in Phabricator and tracked with Gerrit-Migration. It covers all the needs of this task.

Can we please NOT add yet another infrastructure we will have to maintain? Meanwhile either use Gerrit or a third party git hosting solution.

yuvipanda renamed this task from Figure out a git hosting solution for tools to Figure out a git hosting solution for tools/kubernetes.Oct 30 2015, 11:11 AM

I think you're missing a lot of context here @hashar. This is a ticket that is gathering requirements for what sort of git setup we'd need for our kubernetes setup, and there have been a lot of IRC discussions that haven't been summarised here yet (and that you are probably not aware of). More info will be available in the next few days, I think. In the mean time, please do re-read the description of that task as I think it'll provide at least some more clarity.

Thank you.

Ok, summarizing discussions between me, @mmodell and @demon:

  1. We can use diffusion for this!
  2. There's an unstalbe API that @demon is playing with that'll allow us to do self-serve repo creation. I need to go explore this API to see what all it can do, especially around ACLs to see how we can use it.
  3. Differential is totally optional, so people don't have to use it if they don't need to

Still need to investigate:

  1. Mirroring from and to other repositories (This might not be that important)
  2. Post-receive hooks that are synchronous, and how it handles these at scale. (This definitely is important!)
chasemp subscribed.

@demon and @mmodell: This is a project that I would very much like to make progress on as part of the Community-Tech-Tool-Labs work I am doing this quarter. I want to promote publishing source code for Toolforge tools as a best practice. Today the story I have to tell is "we have git hosting resources available, but you probably want to just use Bitbucket or Github because getting things setup is harder here than there."

The ideal solution from my point of view would be to offer "1-click" creation of new git repos that are readable by anyone and writable by members of a given tool account. I'd like each tool account to be able to create multiple git repos, but if that is too hard for some reason then a single repo per-tool would suffice.

Where can I start reading about the self-serve repo creation API mentioned earlier in this discussion?

@bd808: I've added you to repository-admins group in case you need the permissions to test:

https://phabricator.wikimedia.org/conduit/method/repository.create/
https://phabricator.wikimedia.org/conduit/method/repository.query/

There isn't much documentation, really, but it is almost self-explanatory.
Also, it's not difficult to create / modify conduit methods. So if it's missing something that is needed, then we could probably fork the code for conduit.create and add anything that missing.

One thing to consider is maybe running a separate phabricator instance just for this purpose. I can help you set up a labs instance for testing...

I'd really want to not run another phab instance :D I don't think the labs team will have the bandwidth to maintain it...

I suppose we can create a bot account and use that to autocreate repositories when tools are created?

If repositories get autocreated and tools do not use them then Diffusion will be filled with empty repos. If a tool needs a repo, then there should be an easy way to create one on Phabricator.

One thing to consider is maybe running a separate phabricator instance just for this purpose. I can help you set up a labs instance for testing...

The potential problem I see with running a separate instance is that we then need to find someone to own and maintain that instance. Would the concern for needing separating be the load/usage on the primary instance? We already have a repo that contains most of the code that was rescued from the Toolserver. I'd be happy to be surprised, but I doubt we would end up with more than a few hundred relatively small repos (on the order of the size of a typical MediaWiki extension's repo).

If repositories get autocreated and tools do not use them then Diffusion will be filled with empty repos. If a tool needs a repo, then there should be an easy way to create one on Phabricator.

I'm not sure that auto-creation is a good or bad idea in general yet. I can think of pros and cons certainly. I am sure however that worrying about Diffusion being filled with empty repos is a non-issue. An empty repo would take up a few rows in database tables and a tiny amount of disk. We already host 1500+ git repos so it's not like we have a collection that you can keep track of in your head today.

  1. Supports *blocking* post-receive hooks (eg. Make a HTTP call / run a command on post receive, and send output of that to the pusher. Same behavior as a literal post receive hook in git)

why post-receive? I don't think phabricator uses post-receive (as far as I can tell it only uses pre-receive) so I don't see any reason you couldn't set this up directly in git. It would just require a little automation to manage the repository hooks.

why post-receive? I don't think phabricator uses post-receive (as far as I can tell it only uses pre-receive) so I don't see any reason you couldn't set this up directly in git. It would just require a little automation to manage the repository hooks.

I think the intent of a post-receive hook would be to build a new Docker image and submit it to the k8s grid. This would mimic the deployment style used for Heroku and several other PaaS providers.

The potential problem I see with running a separate instance is that we then need to find someone to own and maintain that instance. Would the concern for needing separating be the load/usage on the primary instance? We already have a repo that contains most of the code that was rescued from the Toolserver. I'd be happy to be surprised, but I doubt we would end up with more than a few hundred relatively small repos (on the order of the size of a typical MediaWiki extension's repo).

Understood. We can certainly handle the load, I believe, and the storage won't be an issue either. After reading your response and thinking it through a bit more thoroughly, I haven't thought of any good reason for a separate instance.

I think the intent of a post-receive hook would be to build a new Docker image and submit it to the k8s grid. This would mimic the deployment style used for Heroku and several other PaaS providers.

Indeed! Although I am now convinced that we shouldn't do the docker building etc ourselves, but use something like deis or openshift (or the many others that work on top of k8s). They all require a post-receive hook though, I think.

@yuvipanda: there is no reason we can't have a post-receive hook in phabricator, we would just need to install the hook ourselves since phabricator doesn't use it.

We obviously don't want to grant everyone with Tool Labs access direct membership in the repository-admins group. We should be able to build a tool that uses the various conduit APIs to:

  • Create a repository
  • Manage the "Editable By" and "Can Push" policies for a repository that has been created through the tool

To keep things simple, all repos managed by the tool will have some uniform characteristics:

  • Repo name is "tool-<service group name>" (e.g. "tool-verisons" for tools.versions)
  • Repo callsign is "TOOL<gid of service group>" (e.g. "TOOL52937" for tools.versions)
  • "Dangerous Changes" flag (force push and branch delete) enabled
  • All members of the tool having a phab account at the time the repo is created are granted edit and push rights.

Some additional simplifying assumptions for the initial tool:

  • All repos are associated with a single shared tool account (service group)
  • Management of the repo via the tool is granted to all members of the tool account (maintainers)
  • Only one repo per tool is allowed
    • We can add support for multiple repos later if needed
  • The tool will not manage mirrors, staging, automation, or any other advanced diffusion settings. These features can be handled manually via phab tickets and members of the repository-admins group.

This set of functions provides the bare minimum needed to publish the code for a tool, namely a web accessible version control system. By adding an .arcconfig file to the repository this bare minimum can be expanded to allow code submission as well via arcanist/differential.

+1.

I think this would also be a wonderful opportunity to allow people to select a LICENSE.

This all sounds reasonable and achievable, though not completely trivial.

It will be a little tricky getting all the details right. Fortunately, I have a bunch of experience messing with phabricator policies and transactions thanks to working on the security extension.

The repository.create conduit method doesn't currently support all of the features needed and from the looks of things it's pretty old / unmaintained. I think I could fork repository.create and add the features needed, however, it will definitely take a bit of work.

If someone wants to pair on it with me then we could probably make some pretty good progress in a day or two though.

After talking through some related things with @yuvipanda and @Krenair yesterday on irc, I think we have a few blockers to clear before getting started on this in earnest:

  • We need an authn/authz mechanism for the repo management interface that gives knowledge about service account membership. SUL OAuth is available to tools today, but SUL accounts have no correlation with wikitech accounts and/or LDAP data.
  • We need a reliable way to correlate LDAP accounts and Phabricator accounts. I'm not certain if the email search provided by user.query gives us this or if we will need some other means to map an LDAP account to the proper Phab account.
  • As @mmodell points out we may need to enhance the repository.create conduit method or alternately dig around in conduit and find a way to manage the various policies that control repository access.

My discussion with @yuvipanda and @Krenair made clear that we need to think about a larger roadmap for Tool Labs and how the various components will fit together to provide us a platform for developing new features and functionality. My next steps will be to draft a "vision" of where we would like to take the Tool Labs platform along with a strawman plan to get there and solicit feedback RfC style before heading too quickly in the direction of any particular solution.

@thcpriani reminded me of https://try.gogs.io/ ... might be worth looking into also?

@thcpriani reminded me of https://try.gogs.io/ ... might be worth looking into also?

Does release engineering want to maintain that as an additional git hosting service? I think Hashar's point in T117071#1768348 was well made that we should have a single system rather than a fragmented landscape of VCS solutions. That being said if in the end we can't find any reasonable way to automate and secure repo creation via Diffusion I would be willing to look for a way to support a new platform that does fit these needs.

@bd808: I agree that we should stick with one integrated system as much as reasonably makes sense, it wasn't too serious of a suggestion really. GOGS does look really light weight and self-contained though, which is interesting.

With Striker+Diffusion, is this resolved now?

With Striker+Diffusion, is this resolved now?

There is at least one missing requirement from the initial list:

Allows other people to contribute to other people's code in some way or other (aka a method for people to send easy patches, no enforcing code review)

This may just be a matter of documentation on how to use differential to send a patch to a repo? Its certainly not the easiest patch submission system, but I'm confident that upstream will keep working on making things easier.

I'm also not sure that we have this one covered:

Supports *blocking* post-receive hooks (eg. Make a HTTP call / run a command on post receive, and send output of that to the pusher. Same behavior as a literal post receive hook in git)

I think the intent of this requirement was to have something that supported a Heroku-style "push to deploy" system. This is something that would be a feature of a PaaS, so we can probably differ solving until T136264: Evaluate Kubernetes based workflow replacement options for SGE gives us a better understanding of what is needed.

With Striker+Diffusion, is this resolved now?

There is at least one missing requirement from the initial list:

Allows other people to contribute to other people's code in some way or other (aka a method for people to send easy patches, no enforcing code review)

This may just be a matter of documentation on how to use differential to send a patch to a repo? Its certainly not the easiest patch submission system, but I'm confident that upstream will keep working on making things easier.

Skipping code review? git push has you covered. Audit has you covered for post-commit review if need be.

I'm also not sure that we have this one covered:

Supports *blocking* post-receive hooks (eg. Make a HTTP call / run a command on post receive, and send output of that to the pusher. Same behavior as a literal post receive hook in git)

I think the intent of this requirement was to have something that supported a Heroku-style "push to deploy" system. This is something that would be a feature of a PaaS, so we can probably differ solving until T136264: Evaluate Kubernetes based workflow replacement options for SGE gives us a better understanding of what is needed.

This can probably be triggered by a herald rule (or rules). Post-commit we can have it hit an HTTP endpoint of your choosing to trigger an auto-deploy.

There is at least one missing requirement from the initial list:

Allows other people to contribute to other people's code in some way or other (aka a method for people to send easy patches, no enforcing code review)

This may just be a matter of documentation on how to use differential to send a patch to a repo? Its certainly not the easiest patch submission system, but I'm confident that upstream will keep working on making things easier.

Skipping code review? git push has you covered. Audit has you covered for post-commit review if need be.

Direct push does work well for the small tool projects I've been hosting in Diffusion. The part that I think may be missing is "other people" (not the repo owners) submitting patches. Again this may just be some work to teach folks how to use arc to submit patches or waiting for upstream to finish their work on a pure git patch submission method.

@bd808: We can allow patches via direct push and then review them via audit. We'd just need to have security set up to reject pushes to existing branches I guess?

@bd808: We can allow patches via direct push and then review them via audit. We'd just need to have security set up to reject pushes to existing branches I guess?

So that would be some form of recreating Gerrit in Diffusion?

So that would be some form of recreating Gerrit in Diffusion?

Well, it means using the audit application to do reviews instead of differential (not a bad thing, IMO) and enforcing some rules on what branches accept pushes so that only maintainers can merge to master / deployment branches. It wouldn't be a very close proximity to how gerrit works but it seems like it'd work fine for tools. But I'm not that familiar with the needs of tools maintainers so I may be way off base here.

I'm a little confused, AIUI the requirements are that a group of tool owners can direct push or go through some kind of patch approval process (their choice). But we always want to allow other people to submit patches that can be reviewed/accepted by the maintainers.

Is there no way to allow direct push for some people through diffusion but still allow anyone to submit patches through differential?

I'm a little confused, AIUI the requirements are that a group of tool owners can direct push or go through some kind of patch approval process (their choice). But we always want to allow other people to submit patches that can be reviewed/accepted by the maintainers.

Is there no way to allow direct push for some people through diffusion but still allow anyone to submit patches through differential?

Of course there is. The permissions would enable such a workflow by default. I was mainly pointing out that the "maintainer" of the repo always has direct pushing abilities, that's how they land patches. So if they don't want to use Differential they don't have to (and Audit enables post-commit review, it's actually enabled for all repos everywhere all the time). I don't think we need any changes at all to the standard permission model we use.

If a repo wanted to force Differential, even for maintainers, we'd use Herald.

OK, gotcha. So probably the real task left here is documenting somewhere (maybe it already exists) how to find the diffusion repo for a tool and then submit a patch to it. And making sure that repos have the right arcconfig file.

OK, gotcha. So probably the real task left here is documenting somewhere (maybe it already exists) how to find the diffusion repo for a tool and then submit a patch to it. And making sure that repos have the right arcconfig file.

If we properly tag Tools repos with a shared project, searching for them should be pretty easy.

And actually, .arcconfig files are optional now (as needed). arc is smart enough these days to not need the basic boilerplate stuff anymore.

OK, gotcha. So probably the real task left here is documenting somewhere (maybe it already exists) how to find the diffusion repo for a tool and then submit a patch to it. And making sure that repos have the right arcconfig file.

Right. And my comments about audit were addressing the concern about new users being forced to use arcanist. If we don't want to require arcanist we can allow non-maintainers to git push to a branch and then maintainers would use audit to do code review and git merge approved patches manually.

@Legoktm: would it be helpful if we indexed repositories->tools mapping in the phabricator search index? I want to improve the repository search experience in phabricator and this seems like a place where there is room for improvement.

dcaro claimed this task.
dcaro subscribed.

This task is no longer relevant, we will support creating containers through buildpacks using any publicly accessible git repository (github/bitbucket/gerrit/gitlab...)

Please reopen and update if it's still needed!