Page MenuHomePhabricator

Have "Last Attracted Developers" information for Gerrit automatically updated / Integrate new demography panels in GrimoireLab product
Closed, ResolvedPublic

Description

Followup on T146631

On https://wikimedia.biterg.io/app/kibana#/dashboard/Git-Demographics in "Last Attracted Developers", "First Commit Date" is not the date when the initial patch set revision was committed, but when the last patch set revision was committed.
This makes it unnecessarily hard to really get a list of the last new contributors and actively make sure that their patches get quick feedback.

Comparison:

x.png (483×1 px, 90 KB)

Upstream: https://gitlab.com/Bitergia/c/Wikimedia/support/issues/16

Related Objects

Event Timeline

Aklapper triaged this task as Medium priority.Nov 20 2016, 7:14 PM
Aklapper created this task.
Lcanasdiaz added subscribers: Dicortazar, Lcanasdiaz.

I'm not sure, so please @Dicortazar correct me if I'm wrong, but the Git demographics panel is correct, the thing is you would need demography information applied to Gerrit.

@Dicortazar how difficult is to have that information for our upstream product with current ES indexes?

Ahem, Luis made me realize that Git-Demographics is about... Git (merged stuff). Not Gerrit (potentially non-merged stuff).
As we use the data on http://korma.wmflabs.org/browser/code_contrib_new_gone.html to nag developers to review proposed patches by new contributors, we'll also need a Gerrit one in Kibana which we don't have yet.
Hence I'm changing the scope of this task.

Aklapper renamed this task from "Last Attracted Developers" on Git-Demographics has incorrect date values for "First Commit Date" to Have "Last Attracted Developers" information for Gerrit (already exists for Git).Nov 23 2016, 3:35 PM
Aklapper lowered the priority of this task from Medium to Low.
Aklapper raised the priority of this task from Low to Medium.Jan 6 2017, 1:27 PM
Aklapper raised the priority of this task from Medium to High.Jan 27 2017, 10:17 PM
Aklapper moved this task from Backlog to Ready to Go on the wikimedia.biterg.io board.

(Unrelated sidenote: Due to its UI this can only work after changing the default (2y) timespan in the upper right corner to something like "Relative > 20y ago". Jesus pointed this out.)

That git_demographics_newcomers visualization shown in the "Attracted developers" widget is based on the author_min_date field in the git_enrich index.
There is no similar field author_min_date in the gerrit_enrich index available. :(

@Aklapper we have to work in two ways with this analysis:

  1. Improve the performance of the analysis (it's quite slow nowadays...)
  2. Add other data sources as Gerrit. You may discuss this with @Lcanasdiaz and give some priority if this is important to your needs.
  1. Add other data sources as Gerrit. You may discuss this with @Lcanasdiaz and give some priority if this is important to your needs.

Thing is: We want to be able to support new contributors already before they finally manage to get their changeset merged. For this, we need to know which Gerrit accounts are new and have provided a first changeset (regardless of the changeset status).

This is a priority, as we need this data by the end of May (due to T160430).

I guess implementation would have to happen in [[ https://github.com/grimoirelab/GrimoireELK/blob/master/grimoire_elk/elk/gerrit.py | elk/gerrit.py ]] (for reference: this is the commit which introduced author_min_date in elk/git.py)?

  1. Add other data sources as Gerrit. You may discuss this with @Lcanasdiaz and give some priority if this is important to your needs.

Thing is: We want to be able to support new contributors already before they finally manage to get their changeset merged. For this, we need to know which Gerrit accounts are new and have provided a first changeset (regardless of the changeset status).

Indeed, we would need demography support for Gerrit for doing this.

This is a priority, as we need this data by the end of May (due to T160430).

I guess implementation would have to happen in [[ https://github.com/grimoirelab/GrimoireELK/blob/master/grimoire_elk/elk/gerrit.py | elk/gerrit.py ]] (for reference: this is the commit which introduced author_min_date in elk/git.py)?

We have this in our roadmap, but we don't have a deadline for implementation. We can try to push this forward, but at this point we're in the middle of a couple of transitions, and it's going to be difficult.

Let us discuss about this, and I come back with something more conclusive.

Trying to find short term workarounds fir this problem,

Very welcome surprise by Bitergia (thanks folks!): This should be possible soon. Admins can access a sneak preview which lists New Authors in Gerrit by first contribution date (among many other useful things):

T151161.png (2×1 px, 531 KB)

There are going to be a few more changes to this before making it available by default / to public.

Config: For Wikimedia, we can kill the "New Authors per First Project"/C_Gerrit_Demo_Projects_Pie and the "New Authors by Top First Projects"/C_Gerrit_Demo_Project_TS widgets, as we are one single boring project. :)

we are one single boring project. :)

This is something maybe you want to change at some point. In case it would make sense to have "subprojects" with some meaning for you, we could configure that in the dashboard.

So.... when could this go live? :) Any obstacles?
Do you need any input from me? (I could imagine to integrate these widgets either into Gerrit-Backlog or Gerrit, to not have a 3rd Gerrit related dashboard.)

FYI I've updated gerrit-reports to use the new gerrit API parameters and we have new updates: https://www.mediawiki.org/w/index.php?title=Gerrit%2FReports%2FOpen_changesets_by_newbie_owner&type=revision&diff=2486934&oldid=2201127

I plan to use this information as usual if I have some time in the next few weeks, i.e. assist new users to get their patches improved and merged.

@Aklapper we've updated

Config: For Wikimedia, we can kill the "New Authors per First Project"/C_Gerrit_Demo_Projects_Pie and the "New Authors by Top First Projects"/C_Gerrit_Demo_Project_TS widgets, as we are one single boring project. :)

Done!

So.... when could this go live? :) Any obstacles?

Sorry for the delay. Here are the new versions for Git and Gerrit:

https://wikimedia.biterg.io/app/kibana#/dashboard/C_Git_Demo
https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo

Do you need any input from me? (I could imagine to integrate these widgets either into Gerrit-Backlog or Gerrit, to not have a 3rd Gerrit related dashboard.)

What we are planning in a nearly future is to have a demographics panel with several datasources.

Here are the new versions for Git and Gerrit:
https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo

Yay, thank you a lot! happy to see that deployed "by default" soon!

Slightly offtopic and rather a note to myself:
Whoah, initially I got utterly confused. I'm not sure when and why my Chromium browser inserts ? in URLs sometimes, but
https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo (correct)
https://wikimedia.biterg.io/app/kibana?#/dashboard/C_Gerrit_Demo (sometimes wrong)
are different things and sometimes display different data which is extremely confusing. Which initially made the C_Gerrit_Demo dashboard show Git instead of Gerrit data. I don't get it, but checking that the repos listed are mediawiki/core format (=Gerrit) and not https://gerrit.wikimedia.org format (=Git) helps.

Yay, thank you a lot! happy to see that deployed "by default" soon!

Any news when to see https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo deployed 'by default'?
And way more important (as this blocks T167085), is it possible to update the data? Currently the latest "new authors" entry is from July 13th and we need recent names...

Any news when to see https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo deployed 'by default'?
And way more important (as this blocks T167085), is it possible to update the data? Currently the latest "new authors" entry is from July 13th and we need recent names...

Not by default, but I did a run of the scripts, so you have fresh indexes.

Something weird I've noticed is that in the whole history of Git, Facebook has a large share of authors. Drilling down, that seems to be due to some repos that probably you forked from upstream repos with a lot of Facebook activity, but still... Let us know if numbers don't seem to match.

Not by default, but I did a run of the scripts, so you have fresh indexes.

Thank you!

Something weird I've noticed is that in the whole history of Git, Facebook has a large share of authors. Drilling down, that seems to be due to some repos that probably you forked from upstream repos with a lot of Facebook activity, but still... Let us know if numbers don't seem to match.

We're aware; I don't consider this a bug currently as we can manually exclude those repositories, but thanks for the heads-up! Now I only need a fresh DB JSON dump. :D

Aklapper renamed this task from Have "Last Attracted Developers" information for Gerrit (already exists for Git) to Have "Last Attracted Developers" information for Gerrit (already exists for Git) automatically updated.Dec 31 2017, 3:35 PM
Aklapper lowered the priority of this task from High to Lowest.
Aklapper moved this task from Oct-Dec 2017 to Jan-Mar-2018 on the Developer-Advocacy board.
Aklapper raised the priority of this task from Lowest to High.Dec 31 2017, 3:38 PM

For the records, we need this data 8 times a year (beginning of Jan,Mar,Apr,Jun,Jul,Sep,Oct,Dec) to either contact new devs for a survey or quarterly stats.
Manual updating is cumbersome and efforts can sometimes collide with our deadlines.

Aklapper renamed this task from Have "Last Attracted Developers" information for Gerrit (already exists for Git) automatically updated to Have "Last Attracted Developers" information for Gerrit automatically updated / Integrate new demography panels in GrimoireLab product.Mar 1 2018, 4:24 PM

Quoting Bitergia:

We have been working on a new, unified Demography panel for both Git and Gerrit data. Now, instead of visiting different panels for each of these sources, you can use the widget "Data Source" to select the data you want to see (see attached screenshot below). [...] Please note that the former Git Demography panel will remain as a legacy version at https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo , so we encourage you to use the new version from now on.

The new panel does not list the "repos by new authors" on which the "last attracted developers" worked (a small regression, but we will survive that), but as the legacy panel does not receive updated data anyway there's IMHO no sense in keeping it around.

Furthermore a data inconsistency bug got also fixed by Bitergia in https://github.com/chaoss/grimoirelab-elk/pull/451 but I could still reproduce that problem in one case. Hence leaving this ticket open though it's nearly fixed.

Everything sorted out and I am very happy to close this task as resolved.