Page MenuHomePhabricator

Mentor sign-up volume choice labels are mathematically unsound
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • go to en:Special:EnrollAsMentor
  • (Note: I am unable to see the enrollment page, as I am already a mentor and it shows me my dashboard instead)

What happens?:

  • From memory, the sign-up process offers mentors a tripartite division of mentee numbers or activity volume, labeled something like: 'half the average', 'average', and 'twice the average'. This is circular, and the average may drop to 1, or increase indefinitely.

What should have happened instead?:

  • It should offer volume labels that do not refer to the average, like, 'low', 'medium', 'high'; or absolute numbers (e.g., 1-10, 11-20, 20-40).

Details at mw:Talk:Growth.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1214400 had a related patch set uploaded (by Shivaansh Singh; author: Shivaansh Singh):

[mediawiki/extensions/GrowthExperiments@master] Mentorship: Use neutral mentor load labels

https://gerrit.wikimedia.org/r/1214400

Urbanecm_WMF subscribed.

Pulling to sprint, as it has a patch provided already.

Hello!

Thanks for filling this task, and for uploading a patch here. Before we decide on how to proceed, I'd like to ask @AAlhazwani-WMF (Growth's Designer) and @KStoller-WMF (Growth's Product Manager) for their input as well. From my perspective, I understand how the labels can be misleading, especially in edge cases.

Internally, we have a mentor pool, which contains usernames of registered mentors. By default (when on the "Average" setting), everyone is in the pool twice. If someone changes their settings to "twice the average", their username will be in the pool four times. In case they set themselves to "half the average", their username will be in the pool only once. Whenever a new account registers, a random username is taken from the pool, which is the mentor we're assigning to that user. The number of times a mentor's username is in the pool is called the mentor's weight.

This means the only changes that happen are relative: if one mentor's weight is smaller relative to the weight of all the other mentors, that mentor would receive fewer newcomers. Conversely, if one mentor's weight is higher relative to all the other mentors, they would receive more newcomers.

Naturally, this only works if only a small portion of the mentors change their preferences. If everyone sets themselves as "Half the average", the system wouldn't change at all, as there wouldn't be any relative differences.

Mathematically, this should indeed work as converging towards half of the average or twice of the average. While the average naturally changes of course, the system would eventually reach a stable configuration. While I can see how the labels might be confusing, the newly proposed labels are probably even more confusing, as they provide absolutely no information about what the actual difference is. In addition to that, this would continue having the "only works when only some mentors change their settings" problem I described above, so "Low" and "Medium" might actually be the same in some cases.

I'm curious what you think about this.

Best regards,
Martin Urbanec

Thanks for thinking about this, @Mathglot and @ShivaanshSingh!

I agree that the current copy is imperfect, but I do worry that "Low, Medium, and High" is too vague.

One possible approach is to focus on the mentor’s preferred workload. Something like:

• Fewer (I have limited time to support mentees)
• Standard number of mentees (default)
• More (I am eager to support more mentees)

This phrasing keeps the intent clear and lets mentors choose based on their availability rather than the underlying distribution mechanics.
Although admittedly those are all rather long phrases for a drop down menu, and inevitably will be even longer when localized in certain languages.

Let's let @AAlhazwani-WMF chime in before we make any changes.

. . . From my perspective, I understand how the labels can be misleading, especially in edge cases.

Internally, we have a mentor pool, which contains usernames of registered mentors. By default (when on the "Average" setting), everyone is in the pool twice. If someone changes their settings to "twice the average", their username will be in the pool four times. In case they set themselves to "half the average", their username will be in the pool only once. Whenever a new account registers, a random username is taken from the pool, which is the mentor we're assigning to that user. The number of times a mentor's username is in the pool is called the mentor's weight.

So what I understand now, is that you are assuring mentor load only proportionally relative to other mentors, and not absolutely based on any number of mentees. Am I correct in concluding that in this scheme there is no upper bound on possible mentor load: they might get 10 mentees, or 100, or 10,000 mentees. Without knowing the details of the number of mentors and the total number of mentees in the future, it would be impossible for a prospective mentor upon signing up to estimate load or to set a threshold on the number of mentees they might be assigned.

If that is a correct understanding, then this is a completely untenable system, with a set of labels that are devoid of any real-world meaning. If you are doing piecework on the assembly line, and you assign the novices half the work of the journeyman, and one fourth that of the masters, that is perhaps fair to begin with, until you dump one million parts into the system and the whole system collapses. However fair it is to tell a novice that the masters have to finish four million parts and you only have to finish one million, if a normal person can only do a thousand a day, the fairness of the division of labor is irrelevant.

I would predict that as load gets heavier with increasing number of mentees, those requesting 2x or average will downgrade and there will be a race to the bottom, with everyone eventually ending up at 1/2, thus total equality to divvy up the load. Then when 1/2 is still too much, mentors will start to leave the system, increasing the load on everyone else, with a vicious cycle occurring until you have nobody left willing to mentor.

I no longer believe that the core problem is that the label names are unsound. The system design is unsound. Please tell me I have this all wrong.

Change #1214996 had a related patch set uploaded (by Shivaansh Singh; author: Shivaansh Singh):

[mediawiki/extensions/GrowthExperiments@master] Mentorship: Clarify mentor workload labels

https://gerrit.wikimedia.org/r/1214996

Change #1215037 had a related patch set uploaded (by Shivaansh Singh; author: Shivaansh Singh):

[mediawiki/extensions/GrowthExperiments@master] Mentorship: Clarify mentor workload labels

https://gerrit.wikimedia.org/r/1215037

. . . From my perspective, I understand how the labels can be misleading, especially in edge cases.

Internally, we have a mentor pool, which contains usernames of registered mentors. By default (when on the "Average" setting), everyone is in the pool twice. If someone changes their settings to "twice the average", their username will be in the pool four times. In case they set themselves to "half the average", their username will be in the pool only once. Whenever a new account registers, a random username is taken from the pool, which is the mentor we're assigning to that user. The number of times a mentor's username is in the pool is called the mentor's weight.

So what I understand now, is that you are assuring mentor load only proportionally relative to other mentors, and not absolutely based on any number of mentees. Am I correct in concluding that in this scheme there is no upper bound on possible mentor load: they might get 10 mentees, or 100, or 10,000 mentees. Without knowing the details of the number of mentors and the total number of mentees in the future, it would be impossible for a prospective mentor upon signing up to estimate load or to set a threshold on the number of mentees they might be assigned.

yeah, to add even more complexity to the discussion.. the challenge here is that we don't know how active those mentees are going to be (maybe we could estimate the "number of questions per mentee per week" based on prev data, though i defer to engineering to confirm or not this). one mentor could have hundreds of mentees and only get a few questions a month.. while another mentor might have just a dozen of mentees, but getting several questions per day.


to build off your prompt @Mathglot, i wonder what does a mentor actually need to know to make this decision?

right now we're trying to describe the mechanism (likelihood, weighting, probability), but what mentors actually need to understand to make an informed decision may be simpler?

so my question for the group is.. what are mentors actually trying to signal when they change this setting? are they saying:

  • how much time they have?
  • how experienced they are?
  • how many questions they can answer?
  • how many mentees they can handle?
  • something else?

if we understand their/your intent when using this control, we might be able to write labels that match that intent, even if the technical implementation underneath is more complex.

Change #1214238 had a related patch set uploaded (by Prakhar0804; author: Prakhar0804):

[mediawiki/extensions/GrowthExperiments@master] GrowthExperiments: Mentorship — use Low/Medium/High labels instead of 'average' phrasing

https://gerrit.wikimedia.org/r/1214238

Hello @Prakhar0804 @ShivaanshSingh,

Thank you for the work you spent on this task. However, please note that as of now, the problem is not the lack of code, but rather agreeing on the solution. You can help building agreement by participating in the discussion on this task, specifically by answering the comments posted earlier by @KStoller-WMF, @AAlhazwani-WMF or myself.

It seems that while they understand the current situation is not perfect, they consider the proposed solution to have other problems (possibly more impactful). The team is open to considering a change if it is a improvement of the status quo, but we're not open to changing one not-so-great solution with another.

Thank you for your understanding.

Sincerely,
Martin Urbanec