Page MenuHomePhabricator

For mirrored GitHub repositories, actually give the canonical source Gerrit URL in the repo description
Open, LowestPublic

Description

GitHub mirrors contain a description under the main header such as

Github mirror of "performance/WebPageTest" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing

This is very unhelpful, because the URL of the original repo has to be constructed manually. Typically, at this point I give up and just content myself with whatever I can see and do in the GitHub mirror.

The description should be something like

GitHub mirror of https://gerrit.wikimedia.org/r/#/admin/projects/performance/WebPageTest – please see https://www.mediawiki.org/wiki/Developer_access for contributing. Thanks!

or the equivalent Diffusion URL if T110607 allows.

Event Timeline

Nemo_bis raised the priority of this task from to Needs Triage.
Nemo_bis updated the task description. (Show Details)
Nemo_bis added a project: Gerrit.
Nemo_bis subscribed.
hashar set Security to None.
greg raised the priority of this task from Low to Medium.Sep 11 2015, 3:27 PM

I don't think we should make new links to git.wikimedia.org, and I'm not sure a link to those gerrit project admin pages would be too useful.

demon lowered the priority of this task from Medium to Lowest.Sep 1 2016, 10:19 PM
demon subscribed.

Most of these were setup automatically and that was the boilerplate text for all of them. Nowadays, most don't even get a description.

If someone wants to whip up a bot/script that can fix these all (skipping repos with "real" descriptions), I'm more than happy to run it under my account or one of the role accounts, but I'm not going to go around and update ~1k repositories manually :)

I'll add that linking back to our repositories would also help getting the canonical location visible in search results, while currently GitHub can get the canonical location hidden.

https://support.google.com/webmasters/answer/66359 said (bold mine):

Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article.

Yes please. The current descriptions are halfway between useless and actively hostile.

What format would you need? A script for generating the text or something that actually interacts with GitHub?

What format would you need? A script for generating the text or something that actually interacts with GitHub?

Both, but mostly the latter, like I said I'm not going to run around editing them by hand :)

I just assumed something existed already (how did these descriptions get created in the first place?)

I used to have a plugin that created those repos automatically (with a description), but it stopped working and I lost the source code. Now repos are created manually on github.

I used to have a plugin that created those repos automatically (with a description), but it stopped working and I lost the source code. Now repos are created manually on github.

Thanks for the context. Some pointers for myself:

git.wikimedia.org has perished, and the gerrit admin page and the Phabricator repo page both utterly suck as a landing page. I think it would be better to use the description as it's normally intended, use the URL to link to the project page on mediawiki.org (or leave it empty if there isn't one), and put everything else in CONTRIBUTORS (see T136863#3714110).

Aklapper renamed this task from Actually give the source URL of the repo from GitHub to For mirrored GitHub repositories, actually give the canonical source Gerrit URL in the repo description.Feb 21 2018, 1:25 PM

Other aspect: Having no way to differentiate repositories mirrored from Gerrit vs. repositories whose canonical home is on GitHub makes it impossible to get proper developer statistics for Wikimedia projects canonically hosted on GitHub.

I actually think we shouldn't bother replicating everything....there's a ton of crap.

What I want to do is re-write my old github plugin again (the upstream one I don't like). It'd handle repo creation (as appropriate), along with keeping descriptions in sync.

We could use github repo topics to identify the repos which are active on github.

[...] the gerrit admin page and the Phabricator repo page both utterly suck as a landing page.

This seems like a blocker, none of the alternatives are good:

I think it would be better to use the description as it's normally intended, use the URL to link to the project page on mediawiki.org (or leave it empty if there isn't one), and put everything else in CONTRIBUTORS (see T136863#3714110).

I like this suggestion, once the user navigates to a proper landing page, it's reasonable to include bare gerrit repo URLs for example.

We could use github repo topics to identify the repos which are active on github.

I think this is a good idea. A few weeks ago when I was cleaning up things in the https://github.com/toolforge/ org account I went through and added a "mirror" topic tag manually to the repos there which are mirrors of Diffusion or Gerrit upstreams.

Some bot could be written to walk the repos under the wikimedia org account and add "mirror" based on the presence of a .gitreview file in the repo (criteria used by @majavah in T249703). @majavah even volunteered at T249703#6157963 to write such a bot to take care of this task if folks could just settle on the text to be used.

git.wikimedia.org has perished, and the gerrit admin page and the Phabricator repo page both utterly suck as a landing page. I think it would be better to use the description as it's normally intended, use the URL to link to the project page on mediawiki.org (or leave it empty if there isn't one), and put everything else in CONTRIBUTORS (see T136863#3714110).

  • The mediawiki/core repo uses this hand-edited description: "🌻 The collaborative editing software that runs Wikipedia. This is a mirror from gerrit.wikimedia.org. See https://www.mediawiki.org/wiki/Developer_access for contributing."
  • The ops/puppet repo uses this hand-edited description: "Wikimedia Foundation operates some of the largest collaborative projects in the world. This is our Puppet repo. This repository is a mirror; see https://www.mediawiki.org/wiki/Developer_access for contributing."

Both avoid the problem of figuring out what Gerrit landing page to use in the description which is inline with @Tgr's suggestion at T109939#3724744. The default for a repo could be Gerrit project's description if it has one (see https://gerrit.wikimedia.org/r/#/admin/projects/?filter=mediawiki%252F for examples) followed by either "This repository is a mirror; see https://www.mediawiki.org/wiki/Developer_access for contributing." or "This is a mirror from gerrit.wikimedia.org. See https://www.mediawiki.org/wiki/Developer_access for contributing.".

Figuring out what project page to link to might be a bit tricky, but there are certainly heuristics that could be used to guess for many of them. If the bot can't figure it out, it could just leave the url blank.

The CONTRIBUTORS aspect is reasonable as well, but I think it is trickier to deal with as it involves changing the repo contents rather than just the metadata provided on the GitHub side. I saw enough angry push back against the CODE_OF_CONDUCT.md additions to stay away from forcing content into existing repos myself, but it seems reasonable that we could at least have a wall of shame tool to show repos that are missing a CONTRIBUTORS file and a suggested default.