Page MenuHomePhabricator

[Session] Technical development statistics (Git and Gerrit)
Closed, ResolvedPublic


Username or display name (will be displayed publicly): AKlapper_(WMF)

Categories/Tags/Keywords (up to 5):

metrics, development statistics

Session type (select one):

  • Presentation (including Q/A) - 25 mins
  • Discussion (including Q/A) - 55 mins
  • Workshop (including Q/A) - 55 mins
  • Lightning talk - 5 mins

Venue (select one):

  • I would like to be on the main track
  • I wouldn't mind being on the main track
  • I need a Jitsi room for the session

When are you available to have the session?

Any time

Session Details

I am going to provide a short introduction to the website which offers statistics about our technical community who use our technical infrastructure (Wikimedia Git and Gerrit).

Target audience:

Technically interested users, people interested in statistics.

What will participants get out of this session? (~50 words)

A better understanding how to gather statistics and insights about our technical contributors who contribute to code repositories.

(Optional) Additional resources:

Event Timeline

Aklapper created this task.

Hello @Aklapper and thanks a lot for proposing this session!

We would love to schedule it on the hacking room track on Sunday, 23rd of May at 12:00 in UTC. You would have 25min, questions and discussions included.

Does this timeslot work for you? We kindly ask you to confirm before May 17th, so we can complete the schedule.

As a speaker in a hacking room, you will use Jitsi, where you will be able to present, share your screen, and interact directly with the participants. We will send you more details closer to the event. If you’d like to schedule a testing session to have a look at Jitsi, just let us know.

If you have any questions, feel free to reach out to me. Thanks!

@Nes: Hej hej. That timeslot works for me! Thanks a lot!

Some notes from the session (about 10 people):



  • What do you use the statistics for?
  • Is this used for eligibility of tech contributors for board elections in 2021 - ?
  • Do you have any estimation why number of contributors are stable ?
    • Numbers are only for Gerrit; limited data sources indexed, activity outside of what we index (tools), maybe Gerrit's learning curve, our docs, etc
  • Is git authors a good way to estimate number of contributors?
    • Probably not, because we also import upstream Git repos from projects outside of Wikimedia to deploy their software on our servers without doing changes ourselves - we import their Git repo and the activity in there, so those contributors get indexed but they don't contribute to Wikimedia software but to software used by Wikimedia instead
  • Translations?
    • Could be an area, unclear how much to consider translators technical contributors
  • We should probably remove external repos from these dashboards more generally no?
    • maybe checking the git remotes for upstream or whatever name it may have to exclude them automatically
    • Would be manual work, plus some mixed cases (repo imported from upstream but some custom changes, e.g. Mailman3), so a mix of outside-Wikimedia-upstream contributions plus a few inside-Wikimedia custom changes in the same repo
  • Do we have any statistics about the diversity (age, sex, nationality etc.) of technical contributors? Or maybe these information is not on developer profiles so we don't
    • maybe only as aggregate anonymous data, but definitely not linked to names and such?
    • We don't have info on age, gender, sex and don't index that. We would have info on IP addresses (location) in some systems, but we do not index that and I am not sure if there is even code to index that in the platform. Plus Privacy Policy might collide (90 days), though this is third-party hosted.
  • maybe from Dev satisfaction survey?
    • Could be, but that would be outside of this system