Page MenuHomePhabricator

Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core
Closed, ResolvedPublic

Description

All SCM stats related to 'core' are currently very misleading.

http://korma.wmflabs.org/browser/scm-repos.html and the SCM repository filter list on http://korma.wmflabs.org/browser/scm-contributors.html do not include "Pywikibot" or "MediaWiki". There are three projects called 'core' and they all point to:

http://korma.wmflabs.org/browser/scm-contributors.html?repository=core

T123808_scm_core_mess (1×1 px, 111 KB)

Given that page list Fabian Neundorf and I (two pywikibot only developers) and Aaron Schulz and Brion Vibber (not pywikibot developers), I assume core contains pywikibot-core and mediawiki-core and something-else-core.

compat does appear to be the deprecated pywikibot-compat repo : http://korma.wmflabs.org/browser/scm-contributors.html?repository=compat

Event Timeline

jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added a project: wikimedia.biterg.io.
jayvdb subscribed.
jayvdb set Security to None.
jayvdb raised the priority of this task from High to Unbreak Now!.Jan 19 2016, 8:34 PM
jayvdb updated the task description. (Show Details)
jayvdb added a subscriber: Qgil.

@jayvdb, Thanks for finding this and raising this! And I think you are indeed right.
Something-else-core might be oojs/core.

To get some data to compare the list of Authors for the last 30 days (ignore the bots filtered out) per repo (only master branches though):

$:andre\> cd pywikibot-core/
$:andre\> git branch
* master
$:andre\> git log --after=2015-12-18 --author='' --pretty=format:"%ae" | sort | uniq -c | sort -rn
     42 jenkins-bot
     14 jayvdb
     10 gno.de
      3 vadi.fedx
      3 mpaa.wiki
      3 lokal.profil
      3 justin.d128
      3 hazardsjwiki
      3 geofbot
$:andre\> cd ../mediawiki-core/
$:andre\> git branch
* master
$:andre\> git log --after=2015-12-18 --author='' --pretty=format:"%ae" | sort | uniq -c | sort -rn
    199 jenkins-bot
     35 aschulz
     30 reedy
     30 l10n-bot
     30 florian.schmidt
     20 matma.rex
     13 fomafix
     13 crazy4sb

http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_mediawiki_core&ds=scr and http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_pywikibot_core&ds=scr exist separately so this problem might only affect some pages / algorithms.

Aklapper renamed this task from SCM project 'core' contains Pywikibot and MediaWiki to Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core.Jan 19 2016, 9:48 PM
Aklapper added a project: DevRel-January-2016.

@jayvdb, Thanks for finding this and raising this! And I think you are indeed right.
Something-else-core might be oojs/core.

Yup; that looks right.

http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_mediawiki_core&ds=scr and http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_pywikibot_core&ds=scr exist separately so this problem might only affect some pages / algorithms.

Yes, scr is OK. scm is the problem.

Not surprisingly, using scm for each of those repo names loads a "Code Review" page (with SCM breadcrumbs) instead of a "Source Code Management" page : http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_mediawiki_core&ds=scm and http://korma.wmflabs.org/browser/repository.html?repository=gerrit.wikimedia.org_pywikibot_core&ds=scm

...that dropdown also lists quite some duplicate entries like wikipedia, vendor, varnish, wikimetrics, wikibugs, wikistats, ... Might be also somehow related.

This is a bug of the UI and the way we should the repositories. I'm working on it next week

Thanks for the issue @jayvdb !. I'm reassigning this to @Lcanasdiaz as he'll be in charge of it finally.

I was wrong, it is a "bug" of our retrieval tool for Git (cvsanaly2)

mysql> SELECT * FROM repositories WHERE name = 'core';
+-----+-----------------------------------------------+------+------+
| id  | uri                                           | name | type |
+-----+-----------------------------------------------+------+------+
|   5 | https://gerrit.wikimedia.org/r/pywikibot/core | core | git  |
|  67 | https://gerrit.wikimedia.org/r/mediawiki/core | core | git  |
| 282 | https://gerrit.wikimedia.org/r/oojs/core      | core | git  |
+-----+-----------------------------------------------+------+------+
3 rows in set (0,00 sec)

We have the same issue with 47 different SCM repo ids and around 100 repos are involved!!

I was wrong, it is a "bug" of our retrieval tool for Git (cvsanaly2)

Is that https://github.com/MetricsGrimoire/CVSAnalY (last commit late 2014). Has the Upstream bug been raised?

I was wrong, it is a "bug" of our retrieval tool for Git (cvsanaly2)

Is that https://github.com/MetricsGrimoire/CVSAnalY (last commit late 2014). Has the Upstream bug been raised?

https://github.com/MetricsGrimoire/CVSAnalY/issues/100

In the meantime I'm modifying the field we use to get that data, so we won't even need that cvanaly gets fixed.

Half of the work is done. I'm going to work on the visualization of the dropdown and push it to production

https://github.com/VizGrimoire/GrimoireLib/commit/4ad7180edec2e2561ee911aa744edecf7cc08f5f

I've just fixed a bug I introduced yesterday with the patch. It is working now :D