Page MenuHomePhabricator

Get the average number of mentees per mentor on all Growth wikis
Closed, ResolvedPublic

Description

Background

In T326733: Take the number of newcomers already assigned to a mentor into account when assigning new accounts to them, we want to take the average number of mentees per mentor into account when newcomers are assigned to mentors. Precisely speaking, we want to give priority to newly registered mentors (who have less than X mentees assigned).

Task

To determine the right threshold, we want to see how many mentees, on average, are assigned to one mentor. This calculation should happen on all Growth wikis. The data should be available by the 2023-05-16 Ambassadors meeting.

Event Timeline

Here is the data requested:

wikimentees
ar.wikipedia.org30766.625
ary.wikipedia.org77.0
az.wikipedia.org1022.75
bn.wikipedia.org2210.5263157894738
ckb.wikipedia.org745.3333333333334
cs.wikipedia.org3309.4117647058824
de.wikipedia.org6934.117647058823
el.wikipedia.org6671.5
en.wikipedia.org14372.26582278481
es.wikipedia.org8358.066666666668
eu.wikipedia.org1155.8333333333333
fa.wikipedia.org25894.6
fr.wikipedia.org7245.479166666667
fr.wiktionary.org1039.0
frr.wikipedia.org11.0
he.wikipedia.org1181.8823529411766
hi.wikipedia.org2.0
hr.wikipedia.org1732.3333333333333
hu.wikipedia.org4269.6
hy.wikipedia.org1443.5
id.wikipedia.org7950.888888888889
it.wikipedia.org7245.0
ko.wikipedia.org2224.5714285714284
ks.wikipedia.org22.5
ku.wikipedia.org105.66666666666667
lv.wikipedia.org797.5
ne.wikipedia.org52.44444444444444
nl.wikipedia.org6003.4
no.wikipedia.org1873.0
nqo.wikipedia.org2.0
pl.wikipedia.org4409.857142857143
pt.wikipedia.org12219.823529411764
ro.wikipedia.org3512.0
ru.wikipedia.org9062.666666666666
se.wikipedia.org7.0
sh.wikipedia.org1.5
sk.wikipedia.org3260.0
sq.wikipedia.org882.0
sr.wikipedia.org1366.25
sv.wikipedia.org2017.0666666666666
te.wikipedia.org290.75
test.wikipedia.org60.61764705882353
tr.wikipedia.org5881.5
tum.wikipedia.org4.0
uk.wikipedia.org1118.2222222222222
vi.wikipedia.org2833.4814814814813
zh.wikipedia.org2673.8125

Calculated via the following snippet:

import json
import requests

from wmfdata import mariadb, spark, utils
import pandas as pd

wiki_codes = utils.get_dblist('growthexperiments')
wikis = spark.run('''
SELECT *
FROM canonical_data.wikis
WHERE database_code IN ({dbnames})
'''.format(dbnames=', '.join(["'%s'" % x for x in wiki_codes])))

dfs = []
for _, row in wikis.iterrows():
    print(row.database_code)
    menteesDf = mariadb.run('''
    SELECT gemm_mentor_id, COUNT(*) AS mentees
    FROM growthexperiments_mentor_mentee
    WHERE gemm_mentor_role = 'primary'
    GROUP BY gemm_mentor_id
    ''', row.database_code, use_x1=True)
    
    if menteesDf.gemm_mentor_id.count() == 0:
        #print('Skipping %s' % row.database_code)
        continue
    
    usernamesDf = mariadb.run('''
    SELECT actor_user, actor_name AS mentor_name
    FROM actor
    WHERE actor_user IN ({ids})
    '''.format(ids=', '.join([str(x) for x in menteesDf.gemm_mentor_id])), row.database_code)
    
    # download the list of mentors
    r = requests.get('https://%s/w/index.php?title=MediaWiki:GrowthMentors.json&action=raw&ctype=application/json' % row.domain_name)
    mentors = r.json()['Mentors']
    if len(mentors) == 0:
        print('Skipping %s, no mentors' % row.database_code)
        continue
    mentorsDf = pd.read_json(json.dumps(mentors), orient='index')
    mentorsDf.index.name = 'mentor_id'
    
    usernamesDf.set_index('actor_user', inplace=True)
    menteesDf.set_index('gemm_mentor_id', inplace=True)
    
    df = menteesDf.join(usernamesDf).join(mentorsDf)
    df.reset_index(inplace=True)
    df['wiki'] = row.domain_name
    df = df[['wiki', 'mentor_name', 'mentees', 'automaticallyAssigned']].fillna('Not a mentor')
    dfs.append(df)

df = pd.concat(dfs)
utils.df_to_remarkup(df.loc[df.automaticallyAssigned == True][['wiki', 'mentees']].groupby('wiki').mean().reset_index())
Urbanecm_WMF added subscribers: KStoller-WMF, Trizek-WMF.

FYI @KStoller-WMF and @Trizek-WMF, here are the data for the upcoming Ambassadors meeting to discuss.