Page MenuHomePhabricator

Investigate workflow to identify recently active new volunteer technical contributors (to potentially invite to upcoming Hackathons / developer events)
Open, LowPublic

Description

  • Need to define a threshold (number of patches?) / time frame, to avoid covering short-term drive-by involvement.
  • Need to check what kind of queries https://wikimedia.biterg.io could offer.

Event Timeline

Aklapper created this task.
Aklapper renamed this task from Investigate workflow to identify recently active new volunteer technical contributors (to potentially invite them to upcoming Hackathons) to Investigate workflow to identify recently active new volunteer technical contributors in Gerrit (to potentially invite them to upcoming Hackathons).Mar 21 2018, 9:58 PM
Aklapper moved this task from Backlog to March on the Developer-Advocacy (Jan-Mar-2018) board.

Assuming that the deadline for Hackathon scholarships is likely about 3 months before the event takes place and that gathering the author names should take place about 4 months before the event, I deliberately chose newcomers who became first active 12 to 8 months ago with 5 or more contributions in that timeframe and then check if they also contributed 5 or more contributions within the 8 to 4 months before the event. Feel free to change these numbers (FYI, applying these very numbers and dates for the Barcelona May 2018 Hackathon, I get six human names).

The steps would be (skip the bullet points and scroll down if you don't want to perform these steps yourself):

  • go to https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo
  • in the upper right corner, click the time frame
  • click "Absolute" on the left
  • for "From", enter a date ~12 months ago
  • for "To", enter a date ~8 months ago
  • Click the "go" button
  • in the "New Authors per Organization" pie chart, click the "Independent" part to apply a filter that displays only independent authors
  • in the "New Authors" widget, click the "Reviews (started)" column header on the right, to sort the list by number of patchsets (highest number first)
  • in the "New Authors" widget, get all names in the "Author" column which have 5 or more 'reviews'/'changesets': Either by pressing the Ctrl key in Firefox and marking the column with author names, or by installing some add-on in Chromium (untested), or by exporting to CSV via the link at the bottom of that widget and then removing all the other (non 'author name') columns and remove the rows in which the number of patchsets is less than 6.
  • construct a query string from those author names; format is author_name:"ABC" OR author_name:"XYZ"
  • copy that search string into your computer's clipboard
  • go to https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit
  • in the upper right corner, click the time frame
  • click "Absolute" on the left
  • for "From", enter a date ~8 months ago
  • for "To", enter a date ~4 months ago
  • Click the "go" button
  • in the "New Authors per Organization" pie chart, click the "Independent" part to apply a filter that displays only independent authors. Note: Might be a different color than before.
  • paste that search string from your computer's clipboard into the text search field on top by replacing the * and press Enter
  • in the "Submitters" widget, click the column header "# Changesets" if the names are not already sorted by that number
  • Look at the names in the "Submitter" column which have again 5 or more 'changesets'/'reviews'.
  • Get the email addresses of those users either by checking the database behind wikimedia.biterg.io (if you have access; if not check who has), or by entering their names in the search field on https://gerrit.wikimedia.org/ and wait for the autocomplete / search proposals to get displayed which include the email address
  • Contact those people via e-mail (?), explain why, make them aware of the event and the scholarship, why it could be interesting for them. With links to more info.

This is the simplest approach I can come up with, which can be performed by anyone. 'Simplest' because the flaw is that there could be someone who was only active in a time frame that's 'on both sides' of the '8 months' threshold and close to the '8 months' threshold. That sounds negligible.

Nota bene: Current bugs: The results do not exclude bot accounts until these indices are merged, and that some authors might be shown twice.

@Aklapper Looks good! If this will be used by other folks (such as event organizers, scholarship committee members, etc.), a few minor instructions steps could be added:

  • In the context of CSV, how to format it, sort the data by the number of patchsets, remove the rows in which the no of patchsets is less than 5, etc.
  • Perhaps, a quick way (e.g. via shell script command) to construct query string with "author_name."

Also, where are these instructions ultimately going to live? IMO, somewhere on /Hackathons/Handbook for organizers to be able to find/ follow this process easily...

  • In the context of CSV, how to format it, sort the data by the number of patchsets, remove the rows in which the no of patchsets is less than 5, etc.

Added by editing my comment above.

  • Perhaps, a quick way (e.g. via shell script command) to construct query string with "author_name".

Line-endings differ on systems and given the variety of systems people use, plus potentially fiddling with escaping the " character, I'd leave that to someone else.

Also, where are these instructions ultimately going to live?

I'd love to defer that to @Rfarrand who has a better overview (plus might have input why my idea does not make sense or such). :)

For the time being, assigning to @Rfarrand as there is currently nothing left to do here for me.

Erica mentioned that we have definitions for active editors (to compare with something, as my criteria is random). Looking for a valid metric is possibly something Research people could help with, if we do not feel comfortable with random numbers.

...and bd808 brought up the idea to potentially contact Programs for things that work which could maybe also work in the Tech space.

assign to @Aklapper and not me, correct?

@Rfarrand: I originally assigned this to you in T189496#4090851 because I wonder where this step should be documented (in the handbook?), to be performed before the scholarship application deadline of an upcoming Hackathon.

Add a note: if this does not make any sense to you / you can't complete it please let us (andre) know and we can try to clarify / improve the documentation.

@Rfarrand: Maybe https://phabricator.wikimedia.org/T189496#4070990 and "feel free to contact https://www.mediawiki.org/wiki/User_talk:AKlapper_(WMF) or https://www.mediawiki.org/wiki/User_talk:SSethi_(WMF) if you'd like some help" could get linked from a place like https://www.mediawiki.org/wiki/Hackathons/Handbook/Manage_participants#Scholarships ? But I don't know who is 'responsible' for announcing scholarships a few months before events take place...

T189496#4070990 above is now outdated because in the meantime, T151161 and T184907 got fixed and some Kibana UI upgrades have taken place.

Updated steps as per late December 2018:

Assuming that the deadline for Hackathon scholarships is likely about 3-4 months before the event takes place and that gathering the author names should take place about 4 months before the event, I deliberately chose newcomers who became first active 14 to 8 months before the event with 5 or more contributions in that timeframe and then check if they also contributed 5 or more contributions within the 8 to 4 months before the event.

The steps would be (skip the bullet points and scroll down if you don't want to perform these steps yourself):

  • go to https://wikimedia.biterg.io/
  • click "Community > Demographics" in the top bar
  • in the upper right corner, click the time frame
  • switch from "Relative" to "Absolute"
  • for "From", enter a date ~12 months before the event
  • for "To", enter a date ~8 months before the event
    • (note: the resulting list will include newcomers first active 14-8 months ago, but only lists newcomers who contributed at least one patch 12-8 months ago)
  • Click the "go" button
  • In the "Organizations" list widget, hover over the "Independent" entry and click the + magnifier icon to apply a filter that displays only independent authors
  • in the "Last Attracted Developers" list widget, click the "Contributions" column header, to sort the list by number of patchsets (highest number first)
  • in the "Last Attracted Developers" widget, get all names in the "Author" column which have 5 or more 'contributions': Either by pressing the Ctrl key in Firefox and marking the column with author names, or by installing some add-on in Chromium (untested), or by exporting to CSV via the link at the bottom of that widget and remove the rows in which the Contributions column is less than 6.
  • construct a query string from those author names; format is author_name:"ABC" OR author_name:"XYZ"
  • copy that search string into your computer's clipboard
  • click "Gerrit > Overview" in the top bar
  • in the upper right corner, click the time frame
  • switch from "Relative" to "Absolute" (if needed again)
  • for "From", enter a date ~8 months before the event
  • for "To", enter a date ~4 months before the event
  • Click the "go" button
  • paste that search string from your computer's clipboard into the text search field on top by replacing the * and press Enter
  • in the "Submitters" widget, click the column header "# Changesets" if the names are not already sorted by that number
  • Look at the names in the "Submitter" column which have again 5 or more 'changesets'/'reviews'.
  • Get the email addresses of those users either by checking the database behind wikimedia.biterg.io (if you have access; if not check who has), or by entering their names in the search field on https://gerrit.wikimedia.org/ and wait for the autocomplete / search proposals to get displayed which include the email address
  • Contact those people via e-mail (?), explain why, make them aware of the event and the scholarship, why it could be interesting for them. With links to more info.

@Rfarrand: Would still love to get some input from you at some point whether this is something we (well, royal we) would like to try (e.g. before Wikimedia Hackathon in May 2020), but of course depends on how to square the circle when deciding on which of the many different scholarship criteria to go for...

I would love this data, but I don't think I can own this work. We have created criteria for accepting scholarship recipients this year and I can add this as a data point if it is available but otherwise we will manually check the links they provide in their registrations. We are also doing a very heavy outreach campaign locally around Albania for technical folks, but they may not be super active on our projects.

An issues that I am not sure about how to resolve revolves around the idea that the scholarship data is owned by the affiliate (not WMF) and would able to used by and available to the organizing team for organizing purposes. If we did this kind of tracking we would both have to get them to agree to pass their data to WMF generally, and also to be tracked. I am not sure this is something that should necessarily be required, but again it would be really helpful.

Do you think instead it would be appropriate for me to do a reverse of this and once we have our top candidates for scholarship acceptances I can go check them all individually following the steps you provided above to make sure that they actually had as much activity as they claim?

Aklapper renamed this task from Investigate workflow to identify recently active new volunteer technical contributors in Gerrit (to potentially invite them to upcoming Hackathons) to Investigate workflow to identify recently active new volunteer technical contributors (to potentially invite to upcoming Hackathons / developer events).Nov 2 2020, 8:08 PM

Hi @Aklapper I would like to generate a list of the top 40 Gerrit contributors, in order to build a list of people to invite to apply for a scholarship for the Hackathon 2023. Could you help me with that? Thanks!

Hi @Aklapper I would like to generate a list of the top 40 Gerrit contributors, in order to build a list of people to invite to apply for a scholarship for the Hackathon 2023. Could you help me with that? Thanks!

@LLacroix-WMF: Hi, please file a new separate ticket as this ticket is about having a general (not only 2023) concept. Depending on the data you need, https://www.mediawiki.org/wiki/Community_metrics#How_can_I%E2%80%A6%3F might cover this (or not). Thanks a lot! :)

Hi @Aklapper I would like to generate a list of the top 40 Gerrit contributors, in order to build a list of people to invite to apply for a scholarship for the Hackathon 2023. Could you help me with that? Thanks!

@LLacroix-WMF: Hi, please file a new separate ticket as this ticket is about having a general (not only 2023) concept. Depending on the data you need, https://www.mediawiki.org/wiki/Community_metrics#How_can_I%E2%80%A6%3F might cover this (or not). Thanks a lot! :)

Done on T326561, I hope I did it right :)