Page MenuHomePhabricator

Determine target wiki selection criteria
Closed, ResolvedPublic

Description

This task represents the work with defining the criteria we will use to identify potential wikis to work directly with on this project (OWC2020).

For more context, see: T233627

Selection criteria

RankCriteriaProxiesRelated task
1)Contributors depend on talk pages to coordinate their wiki workI. Number of unique contributors participating in at least one Wikipedia talk page namespace each month, by experience level; II. Proportion of contributors who make at least 1 contribution to Wikipedia (in any namespace) in a given month who also make at least 1 contribution in at least one Wikipedia talk namespaceT233261
2)There are contributors on the wiki who see improving talk pages as a top priorityQuality and quantity of participation in talk page consultation, someone or some people who took responsibility in the TPC to host/facilitate conversations on their home projectT233627
3)Language representationRTL and non-latin scripts representedT233627
4)Contributors see working with/helping newcomers as a prioritySecond-month new editor retention (Column M)T233627

Done

  • Determine the criteria we will consider when selecting potential target wikis

Event Timeline

ppelberg created this task.Sep 24 2019, 9:17 PM
ppelberg updated the task description. (Show Details)

Some other "Selection Criteria" to consider as inspired by Communication networks do not explain the growth or survival of early-stage peer production projects:
Selection Criteria

  • Projects that are relatively "mature"
    • Maybe this is already covered by our existing criteria, "Have a relatively large number of contributors using article and user talk page." but I'm wanting to make sure we explicitly consider the project's stage in the context of, "When the structure of a project is explicit and the goals are well-defined, as in many early-stage CBPP projects, then there are few social interdependencies...As projects grow and become more complex, it is more difficult to signal needs through the artifact and structured coordination is needed." | Source

Carrying over a question from this morning's standup...

@dchan, what technical criteria should we set in identifying the set of a target wikis we will work most closely with on this project?

So far, we've assumed it's important to have at least one wiki from a RTL language and at least one wiki that uses a non-latin script. Do you think both of these are necessary? Are there other criteria we ought to consider as well?

For context, our primary goal for working with what we're calling "target wikis" is to be surprised. More specifically, to learn things we didn't even think to ask or consider. "Things" like:

  • Uncovering edge cases the team had not designed for
  • Ideas for how a feature could be designed based on communities' past efforts

cc @marcella

ppelberg added a subscriber: MNeisler.EditedOct 24 2019, 3:18 PM

Considering our goal [1] for working with this subset of wikis and how we intend to engage with them, below is a proposal for the criteria (and corresponding proxies) we use to help identify potential wikis to collaborate with on this work.

RankCriteriaProxies
1)Contributors depend on talk pages to coordinate their wiki workNumber of unique contributors participating on talk pages each month
2)There are contributors on the wiki who are see improving talk pages as a top priorityQuality and quantity of participation in talk page consultation, someone or some people who took responsibility in the TPC to host/facilitate conversations on their home project
3)Contributors see working with/helping newcomers as a priorityNumber of monthly active unique Junior Contributors
4)Diverse technical configurationsRTL and non-latin scrip representation

@Whatamidoing-WMF, what are your thoughts on the criteria and proxies below? To be more specific:

  • 1. What criteria do you think is not included that should be?
  • 2. What criteria do you think is included that should not be?
  • 3. What – if any – changes do you think should be made to how the criteria should be ranked?

@MNeisler, what are your thoughts on the proxies below? To be more specific:

  • 4. What do you think might be better proxies for the criteria we've prioritized?

@dchan, what are your thoughts on the criteria and proxies below? To be more specific:

  • 5. What technical criteria should we set in identifying the set of a target wikis we will work most closely with on this project? Are there changes you'd make to what's been proposed so far? [2]

The "Criteria" and "Proxies" above are also represented in this spreadsheet where I'm thinking we'll come up with a draft list of wikis: OWC/Target wiki selection.


  1. Goals: see T233627's task description
  2. Technical criteria: "RTL and non-latin scrip representation" ranked 4th

@ppelberg:

  1. The things I really care about are already present.

    In re the first criterion, different wikis talk in different places. Some wikis use the Talk: namespace heavily. Other (less huge) wikis use a couple of central "Village pump" type pages. Some (possibly all of the smallest wikis) use User_talk: pages preferentially. I'm not sure that it matters which namespace the talking happens in, so long as it's happening on wiki.
  1. I'm a little doubtful that a newcomer orientation is necessary. It's not a bad thing; it doesn't feel like a critical thing. (For clarity, I am not requesting its removal.)
  1. The third and fourth criteria are less important to me than the first two. I think that the third criterion (newcomer focus) is the least important.

@MNeisler, what are your thoughts on the proxies below? To be more specific:

  • 4. What do you think might be better proxies for the criteria we've prioritized?

@ppelberg

Talked through these with @Neil_P._Quinn_WMF today (Thanks, Neil!). Following up with some initial thoughts based on our conversation:

  • For both Criteria 1 and 3, the proposed proxies are really just a measure of the size of the wiki. Larger sized wikis will likely always have a higher number of unique and active contributors compared to smaller ones. To address this, I can normalize the data by dividing it by the overall number of unique contributors or active contributors for each wiki. This will show you the wikis with the highest proportion (vs highest number) of monthly active junior contributors or unique contributors participating on talk pages each month.
  • Criteria 1: I agree the proposed proxy is a decent measure of this criteria. I've started looking into this as part of T233261 and can normalize the data per my comment above to show wikis with the highest proportion of unique contributors participating on talk pages each month.
  • Criteria 3: In addition (or instead of) this proxy, we might want to consider using new editor retention as a better proxy for this criteria. The second-month new editor retention metric is currently defined as "Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days." This metric is used by the Growth team to measure the impact of their work on the help panel and newcomer tasks project. You can refer to the wiki segmentation doc to see the current list of wikis with the highest second-month new editor retention rates (Column M) and highest monthly new active editors (Column L)

    Note: The definition of new editor used in the new editor retention and monthly new active editors metrics is based on the user's registration date, which is different then how we are currently defining junior contributors (which is based on number of cumulative edits). Are the newcomers referred to in Criteria 3 "new editors" or "junior contributors"? Either way it might be better to look at retention rates as a proxy for this criteria if we decide it's worth keeping as a criteria.

This is helpful – thank you, @MNeisler and @Neil_P._Quinn_WMF. A few comments and questions in-line below...

Talked through these with @Neil_P._Quinn_WMF today (Thanks, Neil!). Following up with some initial thoughts based on our conversation:

  • For both Criteria 1 and 3, the proposed proxies are really just a measure of the size of the wiki. Larger sized wikis will likely always have a higher number of unique and active contributors compared to smaller ones. To address this, I can normalize the data by dividing it by the overall number of unique contributors or active contributors for each wiki. This will show you the wikis with the highest proportion (vs highest number) of monthly active junior contributors or unique contributors participating on talk pages each month.

Normalizing the data to show the proportion of monthly active junior contributors or unique contributors participating on talk pages each month sounds like an effective way of measuring the extent to which, "Contributors depend on talk pages to coordinate their wiki work."

...as part of T233261 and can normalize the data per my comment above to show wikis with the highest proportion of unique contributors participating on talk pages each month.

This sounds great. As for the denominator, "...unique contributors participating on talk pages each month." makes sense to me considering we are seeking to understand, currently, how much contributors depend on/use talk pages.

Although, in the point above [1], it seemed like the denominator was still TBD: Which do you think we should go with?

With the above in mind, are you still able to show the overall number of unique contributors participating on talk pages each month, by experience level?

...I think there is value in us knowing the absolute number of contributors using talk pages each month. Assumption: more usage volume predicts contributors using talk pages in a greater number of ways which will help us more easily identify edge cases and identifying edge cases is something we've deemed to be an important outcome of collaborating with these target wikis.

  • Criteria 3: In addition (or instead of) this proxy, we might want to consider using new editor retention as a better proxy for this criteria. The second-month new editor retention metric is currently defined as "Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days." This metric is used by the Growth team to measure the impact of their work on the help panel and newcomer tasks project. You can refer to the wiki segmentation doc to see the current list of wikis with the highest second-month new editor retention rates (Column M) and highest monthly new active editors (Column L)

I just want to make sure I'm understanding the rationale here...

Is the thought something like: "The higher the second-month new editor retention rate is, the greater the number of new editors contributing to Wikipedia will be and thus, the more likely more experienced editors are to interact with less experienced editors during the course of their wiki work?"

Note: The definition of new editor used in the new editor retention and monthly new active editors metrics is based on the user's registration date, which is different then how we are currently defining junior contributors (which is based on number of cumulative edits). Are the newcomers referred to in Criteria 3 "new editors" or "junior contributors"? Either way it might be better to look at retention rates as a proxy for this criteria if we decide it's worth keeping as a criteria.

Roger that. Thank you for calling this out.


  1. "Denominator": dividing it by the overall number of unique contributors or active contributors for each wiki
ppelberg added a comment.EditedOct 28 2019, 10:07 PM

Thank you for thinking through this, Sherry. A few comments in line below.

@ppelberg:

  1. The things I really care about are already present. – In re the first criterion, different wikis talk in different places. Some wikis use the Talk: namespace heavily. Other (less huge) wikis use a couple of central "Village pump" type pages. Some (possibly all of the smallest wikis) use User_talk: pages preferentially. I'm not sure that it matters which namespace the talking happens in, so long as it's happening on wiki.

Great points. For this particular analysis, I have assumed analysis of talk page activity is more straightforward [1] than analyzing talk-like activity in non-talk-namespaces

  1. I'm a little doubtful that a newcomer orientation is necessary. It's not a bad thing; it doesn't feel like a critical thing. (For clarity, I am not requesting its removal.)

I agree about it not being critical. I think most important, as we've talked about, is the extent to which the wiki uses/depends on talk pages and their attitudes towards improving them.

  1. The third and fourth criteria are less important to me than the first two. I think that the third criterion (newcomer focus) is the least important.

I'm not ready to say, "I agree!", tho, I think when it comes time to start reaching out to wikis we can reconsider the "newcomer" criteria in conjunction with how the wiki ranks on the criteria we've prioritized more highly.


  1. "...more straightforward...": given the limitations of how our instrumentation is currently set up.

...as part of T233261 and can normalize the data per my comment above to show wikis with the highest proportion of unique contributors participating on talk pages each month.

This sounds great. As for the denominator, "...unique contributors participating on talk pages each month." makes sense to me considering we are seeking to understand, currently, how much contributors depend on/use talk pages.
Although, in the point above [1], it seemed like the denominator was still TBD: Which do you think we should go with?

The denominator depends on which metric we're looking at. For the metric identified as a proxy for Criteria 1: I'd suggest we use the total number of unique contributors on the wiki.

With the above in mind, are you still able to show the overall number of unique contributors participating on talk pages each month, by experience level?
...I think there is value in us knowing the absolute number of contributors using talk pages each month. Assumption: more usage volume predicts contributors using talk pages in a greater number of ways which will help us more easily identify edge cases and identifying edge cases is something we've deemed to be an important outcome of collaborating with these target wikis.

That makes sense. I just posted initial results of the overall number of unique contributors participating on talk pages each month, by experience level in T233261#5613752. Let me know if you have any questions.

  • Criteria 3: In addition (or instead of) this proxy, we might want to consider using new editor retention as a better proxy for this criteria. The second-month new editor retention metric is currently defined as "Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days." This metric is used by the Growth team to measure the impact of their work on the help panel and newcomer tasks project. You can refer to the wiki segmentation doc to see the current list of wikis with the highest second-month new editor retention rates (Column M) and highest monthly new active editors (Column L)

I just want to make sure I'm understanding the rationale here...
Is the thought something like: "The higher the second-month new editor retention rate is, the greater the number of new editors contributing to Wikipedia will be and thus, the more likely more experienced editors are to interact with less experienced editors during the course of their wiki work?"

I was thinking more in the context of how Criteria 3 is currently phrased as "Contributors see working with/helping newcomers as a priority". If newcomer support is a priority on the wiki, then there will likely be increased efforts from senior contributors to help less experienced editors or new editors, which will result in a higher percentage of new editors returning to contribute".

In-line below are comments from Megan's and my meeting today...

...as part of T233261 and can normalize the data per my comment above to show wikis with the highest proportion of unique contributors participating on talk pages each month.

This sounds great. As for the denominator, "...unique contributors participating on talk pages each month." makes sense to me considering we are seeking to understand, currently, how much contributors depend on/use talk pages.
Although, in the point above [1], it seemed like the denominator was still TBD: Which do you think we should go with?

The denominator depends on which metric we're looking at. For the metric identified as a proxy for Criteria 1: I'd suggest we use the total number of unique contributors on the wiki.

DECIDED: for the denominator, we will say the "total number of unique contributors on the wiki" means any contributor who has made any contribution, to any namespace within the timeframe we are looking at.

The "timeframe we're looking at" = each of the preceding 12 months.

With the above in mind, are you still able to show the overall number of unique contributors participating on talk pages each month, by experience level?
...I think there is value in us knowing the absolute number of contributors using talk pages each month. Assumption: more usage volume predicts contributors using talk pages in a greater number of ways which will help us more easily identify edge cases and identifying edge cases is something we've deemed to be an important outcome of collaborating with these target wikis.

That makes sense. I just posted initial results of the overall number of unique contributors participating on talk pages each month, by experience level in T233261#5613752. Let me know if you have any questions.

Wonderful – thank you. I will comment on T233261 with any questions.

  • Criteria 3: In addition (or instead of) this proxy, we might want to consider using new editor retention as a better proxy for this criteria. The second-month new editor retention metric is currently defined as "Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days." This metric is used by the Growth team to measure the impact of their work on the help panel and newcomer tasks project. You can refer to the wiki segmentation doc to see the current list of wikis with the highest second-month new editor retention rates (Column M) and highest monthly new active editors (Column L)

I just want to make sure I'm understanding the rationale here...
Is the thought something like: "The higher the second-month new editor retention rate is, the greater the number of new editors contributing to Wikipedia will be and thus, the more likely more experienced editors are to interact with less experienced editors during the course of their wiki work?"

I was thinking more in the context of how Criteria 3 is currently phrased as "Contributors see working with/helping newcomers as a priority". If newcomer support is a priority on the wiki, then there will likely be increased efforts from senior contributors to help less experienced editors or new editors, which will result in a higher percentage of new editors returning to contribute".

Gotcha. Thank you for explaining this. I agree with this approach.

DECIDED: we will use "Second-month new editor retention" (Column M) as our proxy for this.

ppelberg added a comment.EditedNov 1 2019, 9:47 PM

Updating task description

  • The task description is now updated with the final criteria and proxies we'll use for evaluating potential target wikis.
RankCriteriaProxiesRelated task
1)Contributors depend on talk pages to coordinate their wiki workI. Number of unique contributors participating in at least one Wikipedia talk page namespace each month, by experience level; II. Proportion of contributors who make at least 1 contribution to Wikipedia (in any namespace) in a given month who also make at least 1 contribution in at least one Wikipedia talk namespaceT233261
2)There are contributors on the wiki who see improving talk pages as a top priorityQuality and quantity of participation in talk page consultation, someone or some people who took responsibility in the TPC to host/facilitate conversations on their home projectT233627
3)Language representationRTL and non-latin scripts representedT233627
4)Contributors see working with/helping newcomers as a prioritySecond-month new editor retention (Column M)T233627

@MNeisler, I'm going to leave this task open until you've given your final sign-off...I'm wanting to make sure we are understanding the criteria and proxies in the same way.

ppelberg updated the task description. (Show Details)Nov 1 2019, 9:51 PM
ppelberg closed this task as Resolved.Nov 14 2019, 5:03 PM

@MNeisler, I'm going to leave this task open until you've given your final sign-off...I'm wanting to make sure we are understanding the criteria and proxies in the same way.

Resolving this task. We have the information we need [1] to move this forward.


  1. T233261#5649248
ppelberg updated the task description. (Show Details)Nov 14 2019, 5:04 PM