Page MenuHomePhabricator

[Task] move git repositories that are dependencies of wikidata to gerrit
Closed, DeclinedPublic

Description

Move git repositories that are dependencies of wikidata to gerrit.

TODOs for each repository:

  • Acquire necessary rights for github orgs, travis, scrutinizer and possibly gerrit, to be able to do the move.
  • Move repo to wikimedia org.
  • Create repository on gerrit.
  • Push from old to new repository.
  • Make sure mirroring works.
  • Ensure travis, scrutinizer and packagist hooks from github mirror works.
  • Change http://wikiba.se/components/ and composer.json
  • Setup irc notification from gerrit.
  • Setup Wikimedia CI for new repos.

Details

Reference
bz72907

Related Objects

StatusSubtypeAssignedTask
ResolvedAddshore
ResolvedAddshore
InvalidNone
DeclinedNone
DeclinedNone
ResolvedJanZerebecki
ResolvedJeroenDeDauw
Resolvedaude
ResolvedJanZerebecki
DuplicateNone
DeclinedNone
ResolvedLadsgroup
ResolvedJanZerebecki
DeclinedNone
Resolvedaude
ResolvedJeroenDeDauw
ResolvedWMDE-leszek
InvalidNone
InvalidNone
ResolvedNone
ResolvedVictorbarbu
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedLucas_Werkmeister_WMDE
DeclinedTarrow
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedNone
ResolvedLadsgroup
ResolvedLucas_Werkmeister_WMDE
ResolvedTarrow
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedItamarWMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedItamarWMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedTarrow
ResolvedItamarWMDE
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedItamarWMDE
ResolvedItamarWMDE
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLucas_Werkmeister_WMDE
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMtDu
ResolvedLydia_Pintscher
DeclinedLydia_Pintscher
InvalidNone
InvalidNone
ResolvedReedy
ResolvedLegoktm
ResolvedLadsgroup
ResolvedNone
ResolvedJanZerebecki
ResolvedNone
ResolvedJanZerebecki
ResolvedKrinkle
InvalidNone
ResolvedJanZerebecki
ResolvedAddshore

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

First of all, why? Secondly, we cannot do this without forking certain repos.

Also, we already had this discussion, and esentially decided to have components that depend on MediaWiki be on Gerrit and those that don't on GitHub. Which is why I wonder why this bug appears all of a sudden.

I'd like to so a good justification that holds into account the disadvantages and costs of a move for each individual component before it's moved.

(In reply to Jeroen De Dauw from comment #1)

First of all, why?

Why were they not put on gerrit to begin with?

  1. For consistency; it just corrects a mistake. Contributions and review can then be done in one place.
  2. Because a build of stuff deployed on the WMF cluster should not need external services.
  3. Github is not free software.

Secondly, we cannot do this without forking certain repos.

Which ones are you thinking of?

For stuff that are external libraries maintained by non-mediawiki communitites that are pulled in during a Wikidata build, a mirror on gerrit is sufficient, but lets only take a stab at that after the rest is done.

Also, we already had this discussion, and esentially decided to have
components that depend on MediaWiki be on Gerrit and those that don't on
GitHub. Which is why I wonder why this bug appears all of a sudden.

Can you link to this discussion?

I'd like to so a good justification that holds into account the
disadvantages and costs of a move for each individual component before it's
moved.

What are the disadvantages? I'm not aware of any.
The cost is: Create gerrit repo; create phabricator project; merge any open pull requests; git push into gerrit; move over any old pull requests that can't be merged yet; move over any open bug reports; update repo URL where it is used; put old repo in deprecated state. Not that big IMHO.

+1 to moving stuff to gerrit.

Calling this issue unconfirmed is a bit insulting.

The issue were given an unconfirmed status per the comment 1. This is coherent with the following vision of a bug. NEW requires an action, UNCO requires an analysis before en action.

In this case NEW requests the start of the moving processus, UNCO let the opportunity to discuss the case, for example through a RFC on mediawiki.org.

Nothing insulting in that.

NEW does not prevent analysis.

https://www.mediawiki.org/wiki/Bug_management/Bug_report_life_cycle says about UNCONFIRMED:
"The status is changed to NEW when it has been verified that is indeed a MediaWiki or Wikimedia bug and when it can be reproduced."
So NEW is the correct status.

Beside that, do you think I didn't address the concerns in comment 1? If yes, which one?

To hijack the bug process for policies making and use a life cycle designed for technical issues will of course create this divergence of reading.

Please also refrain to give opinions as objective FACT. You consider NEW is the correct status. NEW isn't the objective correct status. A sound argument for UNCO is "it hasn't been verified a Wikimedia bug, as we don't have any consensus to perform such a move'.

But this discussion isn't about NEW or UNCO status. It's about moving Git repositories.

Experience told us a bug tracker is a poor shop to discuss policies. Please open a RFC on mediawiki.org or discuss the issue on the relevant mailing lists.

A community consensus is required to revert a current consensus. Comment 1 says "Also, we already had this discussion, and esentially decided to have components that depend on MediaWiki be on Gerrit and those that don't on GitHub".

AFAIK generally consensus in Wikimedia is to have stuff on gerrit.wikimedia.org (and mirror that to github). Do you agree?

Maybe there was some special decision for Wikidata stuff. Can you point to that previous discussion? But AFAIK there was no _public_ discussion for Wikidata to deviate from the general case. I'm really interested in having access to such a discussion, as that would allow us to see what were the problems and what was tried to address them.

Re: UNCONF vs NEW: Topic will be fully obsolete in two weeks in Phab anyway.

(In reply to Jan Zerebecki from comment #8)

Maybe there was some special decision for Wikidata stuff.

Maybe in the WMDE office.
Related: https://bugzilla.wikimedia.org/show_bug.cgi?id=62115

The problem you're trying to solve is a divergence of opinion about the more convenient and best platform to do outreach and be open to the world.

GitHub allows us more easily to get feedback and allows external contributions. Wikimedia hosted applications allows to centralize and make convenient connections between the whole family of products, but created an incoming barrier.

And please don't build an argumentation on the fact the previous consensus isn't visible or public or well documented. You are the one willing to change the consensus, it's your responsibility to get one. The fact the previous consensus weren't publicly and correctly documented is another issue.

Now, you're using a bug report to gather facts about previous discussions, and ask opinions. That should be red alert for you .

I suspect your main reticence to open such a discussion is the hope we can at one point of this bug agree with you and the fact it's not big deal okay and let's do that. This isn't going to occur.

Please note for example you're totally ignored by Wikidata developers after the Jeroen initial comment.

Please invest the time to launch a proper discussion on mediawiki.org and notify the relevant mailing lists.

(That should be red alert for you of the need to open a discussion)

Thank you, Andre for the related bug.

(In reply to Dereckson from comment #10)

Please note for example you're totally ignored by Wikidata developers after
the Jeroen initial comment.

I don't think I'm being ignored :) . I have merge permissions on all the repositories relevant to this bug.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher raised the priority of this task from Lowest to Low.Dec 27 2014, 2:52 PM
Lydia_Pintscher removed a project: Wikidata.org.
Lydia_Pintscher set Security to None.
Lydia_Pintscher raised the priority of this task from Low to High.Feb 19 2015, 10:58 AM
Lydia_Pintscher added a subscriber: Tobi_WMDE_SW.

Thank you Dereckson for writing down your concerns and observations, I share many of them. I do not expect constructive discussion to happen here, and just get one-sided repeats of arguments already covered in the past. If we want to improve things, then IMO we should at concrete pain points and how we can address them without dismissing opinions of other involved people.

I'm commenting here now since I saw this linked from the current Wikidata sprint, and want to reiterate that I object to this change.

Could you list your concrete pain points?

As the reasons for making the decision and the decision itself to split the repositories were not documented properly (in the sense of not at all), this matter is doomed to reappear again and again. I remember parts of the discussion which happend quite a while ago (like about two years) just as I remember @daniel and me having objections against moving components to GitHub. While it seems like we need to rewind all arguments, an outcome should be a clear and documented statement by Product Management.

I think a particular pain point is the wide-spread and branched-out modularization. I smell the hope that, by moving all components to one place, the software and its internal relations would be more understandable. Just by moving the components that problem will not be solved though.

The current situation is a modularization mess resulting in de facto closed source as no one is able to grasp the relations between all components without tackling a ridiculously steep learning curve. I have created a diagram trying to capture the PHP data model component on T75604 -- imagine that a dozen of times for all the components. While documentation could, at least, ease the problem, the reality is missing and outdated documentation just adding to the problem with the way up the hill being paved with disappointment as quite some modules are just not as generically useful as one would expect and/or they lack stability. Additionally, even the most basic component, the data model, is still under dispute. Another example is the Time value which is embarrassingly broken. Why would some external developer want to jump into that wavy sea at all?

Well, maybe the software is indeed doing complex things and it seems like the amount of complexity is barely manageable by the core developers alone. As far as I am concerned, the move to GitHub was actually driven by the idea to open up to developers from outside the MediaWiki context as the amount of potential developers who could easily reuse these compontens, is higher on GitHub than on Gerrit. While the idea is totally legitimate, until now, that has failed in all matters. I am sure the intendend effect of having components being on GitHub can be generated but, in my opinion, not with the current component design. (e.g.: Why is there no independent generic coordinate or time parsing library that I could just dump into my project?)

At the bottom line, as I see it, either we just give up on the idea of GitHub being the portal to more contributors and attractive reuse of individual components or we considerably rethink the component design which -- in my opinion -- is the actual problem anyway.

I just wrote this as an email to wikimedia-de-tech. As a reference:

GitHub does have two major problems:

  1. It's not possible to add people for review. There is no review board. You can ping people via email, which usually gets lost. You can assign one person, which never turned out to be useful whenever I tried that. I don't even know what that actually does. Just sending another email?
  2. It's not possible to chain patches. It's not possible to continue working on code until it is merged. This creates the situation where you have to:
    • stop working on certain code until the base patch is merged,
    • force team members to merge base patches fast,
    • create chains of multiple commits in one pull request, which usually makes such pull requests non-reviewable (as happened in the "remove Claim" task, for example),
    • create duplicates that touch the same lines of code, which causes merge conflicts and means reviewers must review the same code multiple times.

All these possibilities are wasting somebodys time.

GitHub is a nice tool, but it just doesn't fit our workflows.

Slightly related, we are discussing the use of Phabricator's Differential code review tool for repositories not requiring Gerrit + Jenkins' continuous integration. An ambitious but still realistic goal is to deploy Differential and the first code projects at the Wikimedia-Hackathon-2015 -- see T560#1095147

We still don't know when we will work on the migration from Gerrit, but the plan is that eventually all Wikimedia code review will happen in Phabricator (T18).

@Qgil, how will the "chain" aspect mentioned above work then? How to submit a patch to Phabricator that depends on an other patch that's still in review?

Clearly, there is no chain unless everything is either in Gerrit or in Phabricator.

Erm, not what I meant. Will Phabricator support chains of patches and if it does, how?

How about we start small and begin with the Wikidata.org and WikimediaBadges extensions that are currently packaged in the "Wikidata" build:

Both would benefit from being on gerrit and being deployed through the normal deployment process so they can be localized at TWN, and it would help with converting them to use extension registration. Neither have been updated recently, so it shouldn't interfere with anyone's workflow majorly.

That is a good idea as those two are not registered at packagist.org so for them the blocker task is not one.

WikimediaBadges and Wikidata.org would be good to start with and can't imagine objections to moving those. I would also like PropertySuggester in gerrit.

we haven't yet converted the PropertySuggester's api module to use i18n, but should do that soon. and then how do we get translations when it is in github? in gerrit, there is no problem.

Let me sum up a bit:

Our current situation is that we have code that we maintain for our product WIkidata in five different locations:

  1. Gerrit
  2. https://github.com/wmde
  3. https://github.com/datavalues
  4. https://github.com/wikidata-lib
  5. https://github.com/wikidata

That includes two different code hosting systems which means two different development processes for contributors. Especially inside the Wikidata team which is the main contributor to the components this leads to frustration and a significant overhead when contributing and reviewing code and maintaining the rpositories. So, having the components we maintain in one place and having one development process would be a large improvement compared to the current situation.

IMHO there are several scenarios:

  1. Have all components on GitHub under one organization
  2. Have all components on Gerrit
  3. Have all components on Differential
  1. is not yet possible but according to the migration timeline on https://www.mediawiki.org/wiki/Phabricator#Migration_timeline migrating from Gerrit to Phabricator is still planned for this year. A demo is planned for the hackathon in May: https://phabricator.wikimedia.org/T560#1095147

While I would not totally exclude 1) it seems the overall opinion is going towards 2). See this ticket (T74907), the thread at https://lists.wikimedia.org/pipermail/wikidata-l/2015-February/005421.html and the mood in the Wikidata team itself.

In any case, we need to make the decision first which road we take, before we actually do concrete actions like T92520. While some components might be easy to migrate, some others are not and require several steps tackled first, like code coverage reports on Jenkins (T88434, probably T88435), code-quality-inspection on Jenkins/Gerrit (e.g. Scrutinizer), integration with Packagist (T87768), running unit tests on different PHP-versions and DB systems, etc..
Also in case we decide to go for 2) there is the question if it is probably better to wait for 3) becoming possible for not having to do the migration twice. On the other side, the Gerrit-Differential migration might be mostly done for us by WMF (citation needed) while the migration GitHub-Differential could get trickier (citation needed).

Anyway, I want us to put thoughts into these questions before we actual start moving things around.

Well something that might be interesting (from the phab site)

Phabricator can host Git, Mercurial and Subversion repositories. It also works well with existing repositories (like GitHub, Bitbucket, or other repositories you already have elsewhere) without needing to host them itself.

For example Facebook have:

When Wikimedia moves to diffusion and away from Gerrit the process of sending code to review is going to be more like github and less like gerrit.
There will be no git-review AFAIK but instead http://phabricator.org/applications/arcanist/

IMO no point in doing anything at all until Gerrit is gone, otherwise you are just going to end up doing more work migrating things around..

Erm, not what I meant. Will Phabricator support chains of patches and if it does, how?

^^^^ @mmodell, @chasemp, please check the question above.

Last time I checked, arcanist/differential do not have any have explicit support for chains of dependent patches. You can submit multiple commits to a single differential changeset ( it takes care of squashing when you submit, but still tracks individual local commits for posterity ) ...

But the lack of dependencies between multiple differential changes is a fairly significant limitation. I suspect it will be added some day, it may even worth our time to implement a rudimentary dependency feature, if we could get it accepted upstream.

We had a discussion about this:

Decision:

  • We will gradually move our repositories to gerrit.
  • We will start with WikimediaBadges and Wikidata.org
  • We are OK with loosing the feature of coverage before merge, but want to implement it in Wikimedia CI afterwards.

TODOs:

  • Acquire necessary rights for github orgs, travis, scrutinizer and possibly gerrit, to be able to do the move.
  • Move repo to wikimedia org.
  • Change http://wikiba.se/components/
  • Create repository on gerrit.
  • Push from old to new repository.
  • Make sure mirroring works.
  • Ensure travis, scrutinizer and packagist hooks from github mirror works.
  • Setup irc notification from gerrit.
  • Setup Wikimedia CI for new repos.

Last time I checked, arcanist/differential do not have any have explicit support for chains of dependent patches.

I was wrong, it does have dependencies now. One differential revision can depend on another revision.

I strongly dislike moving all repositories used by Wikidata to Gerrit. The same goes for any such general policy, such as forcing people to use a particular OS or IDE. If the people that contribute to a repository most want to have it at a given location, than let them. I'm much more happy with the GitHub workflow and flexibility than with Gerrit. However I am not arguing that Wikibase Repository should be moved to GitHub. Neither am I arguing against moving WikimediaBadges to Gerrit, if that is what the people working on it desire.

A lot of the components used by the Wikidata project get significant contributions from people not explicitly inside of the Wikidata team. The opinions of these people should also be held into account. People such as Bene, Lazowik, Addshore, Tpt and indeed myself. The two main authors of Wikibase DataModel Serailization are Tpt and me. The main author of Diff is me.

Additional objection comes from me being highly dubious about ideological undertone of this proposal: "Github is not free software". WMDE uses other non-free software. Should we spend our time on addressing that rather than our departments actual goals? And if we should, should it not be done a bit more holistically? While the non free services we use might not be "ideologically pure", they have provided good value for little effort in the past. Unlike several solutions we set up in-house. We have spend way less time on configuring TravisCI than on Jenkins, and less time on Configuring ScrutinizerCI than our own in house code coverage collector, yet the former provided more features and flexibility in both instances.

I'm also not happy with how this discussion was held. Very different concerns are put forwards and then argumentation is provided about why this solution addresses them. This comes over to me as trying to find reasons for an approach after deciding it is a good one. (I am not saying that is anyone’s intention, just that it comes over that way.) It also deludes the discussion, making it hard to find agreement on concrete points. For instance, the location of issues/tasks has been brought up. Moving the repositories to gerrit gets rid of the GitHub issue trackers. One can simply close the GitHub issue trackers if them being open is the issue. This also makes us gloss over the real mismatch of concerns about the issues/tickets: we want to be able to track all sprint items in one place. Everyone agrees with that (as far as I know). The contention is about what kind of tasks should be tracked by sprints, which is a question of scrum process, and very far removed from where one puts their git repos.

Pulling code from GitHub has been mentioned as a concern by OP. If read-only mirroring on Gerrit is an acceptable approach for this party libraries, then why not do the same for the components who's principal authors prefer GitHub? That is an acceptable compromise for me.

I am myself not against a move of all these libs to Gerrit or Phabricator as I already have to use these tools. But I believe that if we move some of these libs out of GitHub we will maybe miss contributions of external users that may use this libs for other use cases than Wikibase itself and that may be not already users of Gerrit/Phabricator. For example, my primary goal when I helped Jeroen to create the WikibaseDataModelSerialization component was not to improve WIkibase itself but to be able to use easily the WikibaseDataModel component in my bot and some other tools, things that I do now.

So, I think that we should keep libs with external users (DataValues, WikibaseDataModel, Diff, Serialization...) on GitHub in a unified location.

In T74907#1483021, @Tpt wrote:

But I believe that if we move some of these libs out of GitHub we will maybe miss contributions of external users that may use this libs for other use cases than Wikibase itself and that may be not already users of Gerrit/Phabricator.

They will continue to be mirrored to github, like all repos hosted on gerrit.w.o. They move to the wikimedia github organisation and possibly get a different name. github seems to leave behind redirects for each move/rename.

So, I think that we should keep libs with external users (DataValues, WikibaseDataModel, Diff, Serialization...) on GitHub in a unified location.

They are currently split over a few github organisations. Should we unify the location, for those repos that don't get moved to gerrit?

Jonas renamed this task from move git repositories that are dependencies of wikidata to gerrit to [Task] move git repositories that are dependencies of wikidata to gerrit.Aug 14 2015, 4:50 PM

Should we unify the location, for those repos that don't get moved to gerrit?

Yes! (at least in terms of Wikibase specific stuff, such a propertysuggestor..)

I agree, it makes sense to have all repos WMDE is owner of under the WMDE org. That's not everything Wikidata related though. For instance WDTK and DataValues (the base component) are not owned by WMDE.

declining per the fact we dont really need this any more