Page MenuHomePhabricator

Track Git activity in canonical Wikimedia repositories located on Github in
Closed, ResolvedPublic


Currently on we only track Gerrit and Wikimedia Git activity.

Tracking Git activity (not: PRs and tasks) of GitHub repos which are related to Wikimedia would give us more complete developer activity statistics.
This is complicated by T109939 as AFAIK there is no way to identify repos on GitHub whose canonical home is on GitHub and which are not mirrored from Gerrit/Git to GitHub.

(For the records, I once upon a time tried a bunch of things in T163576 to investigate.)

Event Timeline

This also affects (as brought up by jayvdb) plus potential personal GitHub repos of people.

I think I mentioned this somewhere before and than forgot, but something [?] seems to be enabled already. At least I can see some GitHub repos for this query on Git: (sepcifically: and .ime).

Need to understand why exactly (and probably ask Bitergia about that, until we have Bestiary in place).

Correcting the actual scope of this task (Bitergia to deploy Bestiary; Wikimedia to fix T109939) before we can do anything here.

Aklapper renamed this task from Consider enabling GitHub backend in to cover canonical Wikimedia repositories not in Gerrit to Track canonical Wikimedia repositories on Github in 18 2018, 9:02 AM

Asked followup question in (eliminating duplicates automatically for mirrored repositories?); plus want to sort out first before trying this.

Aklapper changed the task status from Stalled to Open.Jan 20 2019, 8:30 PM

Removing "stalled" status as I might do this manually. Note to myself: lists more Github orgs to potentially index.

Many repos under too blurry to check if a mirror and no time to investigate manually, but should improve things already vastly and also cover orgs listed under

Aklapper renamed this task from Track canonical Wikimedia repositories on Github in to Track Git activity in canonical Wikimedia repositories located on Github in 16 2019, 1:07 PM
Aklapper updated the task description. (Show Details)
Aklapper raised the priority of this task from Lowest to Medium.EditedMar 18 2019, 12:01 AM

I finished going through all previously 209 repositories in Github under that don't have the word mirror in their project description && do not have an empty description. That number is now 189.

And I also checked those previously 134 repositories without a description.

The result is in which should allow us to index nearly all Git repositories currently canonical on Github. Once that merge request is accepted, deployed, and verified I am going to close this task.

The exceptions for "nearly all" are some repositories that confused me too much and I did not sort out:

I deliberately did not add 56 Wikimedia Git repositories on Github forked from another Github upstream. We could index them but we'd have more noise from activity that's not Wikimedia but upstream activity. Have not made up my opinion on this.

As a side effect, I added "this is a mirror from Gerrit" yaddayadda to a good bunch of repositories after checking that code review has actually happened on Gerrit (as we also mirror some repositories the other way round from Github to Gerrit; plus for completeness one repository in Github is mirrored from Diffusion but we don't index Diffusion).
As a side effect, I also reduced the number of Github repositories with an empty description from previously 134 to 95.

Followup issues:

Asked followup question in (eliminating duplicates automatically for mirrored repositories?)

Indexing is based on unique hashes hence usually no duplication.
Documented in

This is deployed; 141 additional repos on Github are indexed now on .