(In the context of T160430)

Wikimedia has some Wikimedia's "GitHub-only" code repositories.
Find out how to differentiate (exclude) those repositories that are mirrors-only (maybe there no trivial way).
Also, open question: What about stuff that Wikimedia forked? Exclude or not? (Similar problem with measuring activity in pulled upstream repos in Gerrit)

Aklapper created this task.Apr 21 2017, 7:08 PM
Aklapper claimed this task.Apr 21 2017, 7:35 PM
Aklapper updated the task description.

how to differentiate (exclude) those repositories that are mirrors-only

Link header response for WM repos on GitHub says &page=1964>; rel="last" so there are 1964 repos.
For a comparison of numbers:

$:acko\> ssh aklapper@gerrit.wikimedia.org -p 29418 gerrit ls-projects | wc -l 1736 Trying to somehow get a complete list of repos that are only mirrors there seems to be no one approach. Different spellings; only some mirrored repos have set the homepage key:$:acko\> wget -q "https://api.github.com/orgs/wikimedia/repos?page=XY&per_page=100" -O fooXY.json
$:acko\> grep -r "Github mirror" . | wc -l 1594$:acko\> grep -r "GitHub mirror" . | wc -l
2
$:acko\> grep -r "actual code is hosted" . | wc -l 1595$:acko\> grep -r "\"homepage\": \"https://gerrit.wikimedia.org" . | wc -l
575

Also some repos have no description at all:

$:acko\> grep -r "\"description\": null" . | wc -l 68$:acko\> cat fooXY.json | jq '.[] | select(.description == null) | .name'       ## list GitHub repos with empty description
Aklapper updated the task description.

Chad was kind enough to point me to:

So now there does not seem to be an "easy" way.

Qgil added a subscriber: Qgil.May 4 2017, 10:53 AM

In practical terms, to me this task should be block by another one "Create a list of Featured Projects for new developers" and see whether we have any GitHub only projects in that list.

If we have any GitHub only projects, then we can see whether having an export/mirror in Gerrit makes sense. If not, then we can check the metrics problem again.

In other words, I think putting time on this task before having Featured Projects is not a good use of time.

• Created list of GitHub repositories (output from cat foo01.json | jq -r '.[] | .full_name > github.list' after concatenation)
• Created list of Gerrit repositories (output from ssh aklapper@gerrit.wikimedia.org -p 29418 gerrit ls-projects > gerrit.list)
• Stripped wikimedia/ prefix in github.list
• Replaced / by - in gerrit.list
• Sort entries in both files alphabetically
• Ran diff and only show changed lines (via grep and nothing else)
• - means in Gerrit only, + means in GitHub only:

True that. My curiosity was too strong though. :)

Qgil added a comment.May 4 2017, 11:26 AM

"Not a good use of time" is not an appropriate expression for this task. My apologies! :)

I just wanted to clarify how much we need this answer and under which circumstances. Regardless, I am reading with curiosity too. It is an interesting problem.

Aklapper closed this task as Declined.Jun 30 2017, 5:18 PM

For my records, Mukunda pointed out that mirrored Gerrit projects are listed on https://phabricator.wikimedia.org/r/

I'm going to decline this task for the time being. ("decline" because of the "how" in the summary. Could also be "resolved" because of the "whether" which is "no").
While I found out a few interesting things this task won't move forward due to the current setup. See the dependency tasks in T163576#3232491 which would need to get fixed first to get a basic grip here to move forward.
This task can always be reopened once it's less of a mess and requires less complex manual work to get a basic grip.

Tgr added a subscriber: Tgr.Feb 21 2018, 9:35 PM

Would T109939 really be the easy way here? It seems very fragile. Surely reading the master configuration from Phabricator or gerrit or whatever does the mirroring is superior?

@Tgr: "whatever does the mirroring": If I understand correctly, replication from Gerrit to Github is done by a Gerrit "replication" plugin. wikitech:Gerrit implies that plugin is com.googlesource.gerrit.plugins.replication (upstream code location). Which brought me to T109939...