Page MenuHomePhabricator

Remove deprecated repositories from korma.wmflabs.org code review metrics
Closed, ResolvedPublic

Description

"stud" appears in http://korma.wmflabs.org/browser/gerrit_review_queue.html but I cannot find in the gerrit web ui or anywhere else. Looking at

http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_operations_debs_stud&ds=scr

it seems that it is quite irrelevant from a code review point of view. If the repo doesn't exist, then it should not appear in korma.

There are probably more repos obsolete. The list so far:

  • mediawiki/extensions/ExternalArticles (stuck at the top of the list because of a bug in... our Gerrit instance? The repo is inactive and it is better to remove it than to keep it here.
  • operations/debs/stud
  • operations/puppet/varnish
  • operations/debs/python-statsd

See also:
T103984: Exclude certain repositories (upstream / inactive) from Gerrit metrics by blacklisting
T104845: Automated generation of (Gerrit) repositories for Korma

Related Objects

StatusSubtypeAssignedTask
DuplicateQgil
ResolvedQgil
ResolvedQgil
InvalidNone
InvalidNone
ResolvedAklapper
DeclinedNone
DeclinedNone
OpenNone
ResolvedQgil
ResolvedQgil
ResolvedQgil
ResolvedQgil
ResolvedAklapper
ResolvedNone
ResolvedAklapper
ResolvedAklapper
ResolvedQgil
ResolvedDicortazar
ResolvedDicortazar
ResolvedAklapper
ResolvedDicortazar

Event Timeline

Qgil raised the priority of this task from to Medium.
Qgil updated the task description. (Show Details)
Qgil added a project: wikimedia.biterg.io.
Qgil added subscribers: Qgil, Dicortazar, Aklapper.
Qgil renamed this task from Remove operations/debs/stud repository from korma.wmflabs.org code review metrics to Remove deprecated repositories from korma.wmflabs.org code review metrics.Jun 8 2015, 10:47 PM
Qgil updated the task description. (Show Details)
Qgil set Security to None.

After the next refresh of http://korma.wmflabs.org/browser/gerrit_review_queue.html, I expect the four projects listed above to take the top four positions in the ranking, distorting the actual positions of all the rest of projects.

After some more cleaning, UploadWizard (currently #7) is the real "leader" of the classification.

Not only clearly deprecated repositories (as in they cannot be found in Gerrit anymore). I think unmaintained repositories should be also removed from our scope. Should we organize this in this task or in a new one?

The definition of unmaintained repository is being discussed at T102920.

Not only clearly deprecated repositories (as in they cannot be found in Gerrit anymore).

"deprecated" = "non-existing". So if I took some tool to access that repo, an error message should be triggered, and there's nothing to judge or discuss.

I think unmaintained repositories should be also removed from our scope. Should we organize this in this task or in a new one?

"unmaintained" = "no changes for a certain while". That would require code to check the amount of certain commits in a certain timeframe.
Implementation-wise that sounds like a separate request to me.

When updating the list of repositories at T104845: Automated generation of (Gerrit) repositories for Korma by Octopu (the tool in charge of retrieving the projects in Gerrit), the non-existing-anymore ones should not appear there.

However, should we remove the information about them? I mean, if there is a project with some metrics that means that people have been working there. If we remove that project, its metrics will be lost forever. From my point of view the old projects should remain in the project as they still provide information (at least historical one).

Comments?

Storing the data and keeping it available is fine and good. My aim creating this task was simply to remove them from the ranking of repos with open reviews.

I see the point. However metrics are provided for the full list of repositories. So, if there are open tickets from those repos and they do not exist any more, either we remove them from the list of repos or we ignore them when calculating the metrics.

And here I'm going to contradict myself :(. After checking the simplest process, if we do not want to show info from deprecated repos, we should remove them from the database.

If you do not mind, I'll proceed in that way. Extra comments are welcome in any case :).

I was not able to access those tickets to do anything about them, so yes, let's just remove the repos. If someone would have wanted to keep the history, they would have kept them in Git/Gerrit.

if we do not want to show info from deprecated repos, we should remove them from the database.

If you do not mind, I'll proceed in that way.

Yes, please proceed that way. :)

@Dicortazar: What's the status when it comes to Octopu? Is this already in place, or will this happen in the next two or three weeks?

Copying @Dicortazar's comment from T104845#1552564:

Some updates: there's a new tool in Metrics Grimoire named as 'rremoval' [1] (Repository Removal Tool). This will help to remove those repositories that are not interesting in the analysis.
We still have to integrate the several tools in Automator to automate the whole process.
[1] https://github.com/MetricsGrimoire/rremoval

https://github.com/Bitergia/mediawiki-repositories/pull/1 requested in order to remove ExternalArticles. The other repos listed above were already removed from that file, but still appear at http://korma.wmflabs.org/browser/gerrit_review_queue.html

Thanks for the change!

We're updating a bit the process. I'm creating some "blacklists" that will be the projects that appear in the gerrit_projects.conf file but you do not want them to be part of Korma. That list will be updated in https://github.com/Bitergia/mediawiki-repositories/blob/master/gerrit_projects_blacklist.conf.

I've accepted your pull request and added a new commit adding that project to the blacklist.

The idea is that you only need to update the blacklist files (so far only for gerrit), while the file with all of the projects contains information automatically retrieved from the original data source.

Hi again,

The process keeps going. With the new automatic way to remove repositories, I've detected the following ones as being deprecated (but still found in the database):

analytics
analytics/fundraising
analytics/fundraising/dashboard
analytics/wp-zero/data
apps
apps/android
apps/firefox
apps/glass
apps/ios
apps/mobile
apps/win8
integration
integration/jenkins-job-builder-config
integration/kss
integration/zuul-config
labs/tools
mediawiki/extensions/Hanp
mediawiki/php
mediawiki/ruby
mediawiki/services
mediawiki/skins/LivingStyleGuide
mediawiki/tools
operations/debs/dropwizard-metrics
operations/debs/python-statsd
operations/debs/stud
operations/debs/txstatsd
operations/puppet/varnish
operations/software/mwprof
operations/software/mwprof/reporter
pywikibot
pywikibot/bots
qa
wikimedia
wikimedia/bots
wikimedia/fundraising

I guess we want all of them to be removed from the whole analysis in Korma.

Ok, assuming that deprecated repositories are those that are in the database, but not in the list of projects provided by Gerrit, all of the Gerrit projects listed above were successfully removed.

Once data are updated in Korma, I'll close this task.

Ok, assuming that deprecated repositories are those that are in the database, but not in the list of projects provided by Gerrit, all of the Gerrit projects listed above were successfully removed.

I think that assumption is fair, yes. I've checked a few listed ones via the browser (gitblit web interface) like http://git.wikimedia.org/summary/pywikibot/bots.git or http://git.wikimedia.org/summary/qa.git and those repos are either empty or ancient and not correctly moved.

Once data are updated in Korma, I'll close this task.

Looking forward to that!

ok, data are now updated for all of the gerrit repositories.

If there were repositories such as ExternalArticles in the list, this is because they are not deprecated (they are indeed in the list of gerrit repositories returned by gerrit), but other reasons such as upstream repositories, etc.

So, I'm closing this task and leaving the rest of the required repositories to remove in T103984: Exclude certain repositories (upstream / inactive) from Gerrit metrics by blacklisting. Those should be added to the blacklist of gerrit repositories. I'll add some extra info to that ticket.

Dicortazar moved this task from Need Discussion to Doing on the ECT-August-2015 board.
This comment has been deleted.