Page MenuHomePhabricator

Create OWC metric definitions
Closed, ResolvedPublic

Description

We need to clearly define and represent the words we are using in our metrics.

These words and definitions should ultimately be represented here: https://www.mediawiki.org/wiki/Talk_pages_project/Glossary

This task is about agreeing upon the initial set of definitions being used in tasks like: T233888

Definitions

NumberTermDefinitionDefined: Y/N
1.Junior contributorsRegistered users [1] who have made <100 cumulative contributions to Wikipedia✅ Yes
2.Participating on talk pagesSuccessfully contributed to any of the 16 Wikipedia talk page namespaces that results in a diff that is not reverted✅ Yes
3.Talk pagesFor the purposes of calculating our baseline metrics, we will include all 16 Wikipedia talk page namespaces.✅ Yes
4.30-day RetentionContributors who come back to make an edit in any one of Wikipedia's 16 talk page namespaces within the 30 days that follow the "cool down" period. We have defined the "cool down" period as the 24 hours that follow a contributor's first edit. See: T234046#5578128
5.Productive contributions to article pagesFor purposes of calculating baselines: edits to Wikipedia articles that are not reverted✅ Yes
6.Article pagesNamespace 0 or "Main/Article" namespace✅ Yes
7.Talk page featuresWe are removing this definition and instead using the "Participating on talk pages" as part of our definition of retention. See: T233888---
8.Contributions to WikipediaAny action that results in a diff that is not reverted✅ Yes
9.Contribution to talk pageAny action in any of the 16 Wikipedia talk page namespaces that results in a diff that is not reverted✅ Yes
10.Senior contributorsRegistered users who have more than 500 edits✅ Yes

"Done"


  1. "Registered" = we have no way of tracking unregistered contributors.

Event Timeline

ppelberg created this task.Sep 27 2019, 2:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2019, 2:24 PM
ppelberg added a comment.EditedOct 1 2019, 11:48 PM

Updating the task description to include our current definitions.

In doing so, a couple questions came to mind:

  • "2. Participating on talk pages": Successfully contributed to any talk page on any project that results in a diff
    • What value would there be in monitoring talk page usage across all projects? I ask this considering we will be working directly with Wikipedia projects.
  • "3. Talk pages":
    • If we define "talk page usage" as any talk page on any project for the purposes of establishing our baseline metrics T233890, will we be able to later expand this definition to include pages in which talk page features are deployed?
  • "5. Productive contributions to article pages": any edits to Wikipedia articles that are not reverted.
    • Is ORES goodfaith a better predictor of "productive contributions" than non-reverted edits?
  • "7. Talk page features":
    • What should we consider a "talk page feature"? Idea: anything we deem to be more a core action a contributor could take on a talk page (e.g. attempting to/successfully starting a new topic, watching the page, replying to a comment within an open discussion/topic, etc.)
    • Which features are we currently tracking the usage of?
  • "9. Contribution to talk page": making any kind of change that results in a diff.
    • How might we make this definition more specific? We are wanting to measure contributors participating in a conversation. Thought: assuming automatically appending signatures to posts/replies proves to be valuable, perhaps over time there will be fewer noisy edits (e.g. adding a signature to an unsigned comment) and thus "any change that results in a diff" will be a sufficient proxy for measuring contributions to a conversation.
ppelberg renamed this task from Create metric definitions to Create OWC metric definitions.Oct 1 2019, 11:49 PM
ppelberg added a project: Product-Analytics.
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)

Adding some additional questions/thoughts to the ones posted in T234046#5539981:

  • I'm thinking we should limit our analysis of talk pages to those on Wikipedia considering Wikipedia is the project we will be working with most directly
  • I'm thinking we should define "Participating on talk pages" as starting a new discussion or replying to an existing one
    • Thinking: Talk pages are places to communicate with other contributors. The purpose of that communication depends on the context (e.g. article talk pages are intended as workspaces where contributors come together to write better articles, user talk pages are intended as places where contributors can communicate/coordinate with one another in an more open-ended way). In either case, the core interaction – as talk pages are currently implemented – is to communicate with other people
MNeisler moved this task from Triage to Doing on the Product-Analytics board.Oct 7 2019, 4:35 PM
ppelberg added a subscriber: JKatzWMF.EditedOct 9 2019, 12:19 AM
  • "5. Productive contributions to article pages": any edits to Wikipedia articles that are not reverted.
    • Is ORES goodfaith a better predictor of "productive contributions" than non-reverted edits?

@MNeisler, @JKatzWMF made a good point RE revert rate as a proxy for contribution quality...

He entered the idea there could be instances where revert rate goes down as edit volume goes up because the existing base of contributors is not able to handle/moderate the influx of activity.

Noting some current thoughts/responses to the questions above:

  • 1. Junior Contributors: It will be difficult to limit this to cumulative contributions to Wikipedia Projects only. Current fields available in mediawiki_history and EditAttemptStep are based on cumulative edits across all projects.
  • 2. Participating on talk pages:

I agree with your comment posted in T234046#5551960 that it makes sense to focus on core interaction with a page which would include starting a new discussion or replying to an existing one.

What value would there be in monitoring talk page usage across all projects? I ask this considering we will be working directly with Wikipedia projects.

I think we should monitor talk page usage only on Wikipedia projects since that will be the focus of the project. It will be easier to assess the direct impact of any changes the team makes as part of this project if we monitor how usage changes across the same project type.

  • 3. Talk Pages

Will changes be deployed to all 16 designated talk namespaces or just user and talk? If changes are deployed to all talk namespaces, I’m currently thinking we should not limit this to just the user and talk page. We can always filter and do breakdowns in the analysis to compare metrics across different types of talk pages, which might provide some interesting insight into different talk page usage.

  • 4. Retention

A couple clarifications still needed for this definition:

  • Audience: Should we only focus on junior contributors?
  • Activity: Recommend defining as participating on a talk page.
  • "Cool down period": We need to make sure to define a “cool down period” to not include the hours immediately after the person first edits a talk page based on the assumption that talk page editing may happen in bursts. In other words, we would not include the first 24 hours (or some other defined time period) after the person first edits a talk page within the 30 day retention period.
  • 5. Productive contributions to article pages:

    > Is ORES goodfaith a better predictor of "productive contributions" than non-reverted edits?

The ORES edit quality model is likely a better predictor than non-reverted edits and would be interesting to explore for this project. There are a couple potential issues:
(1) Models have to be built individually for each wiki, so you won't have global coverage. It looks like, currently, only 26 Wikipedias have the damaging/good-faith models that really improve on looking at reverts (https://tools.wmflabs.org/ores-support-checklist/).
(2) Data access to the ORES scores would take more work/time (but is feasible especially if we are looking at a smaller list of edits such as with the mobile VE experiment ). We can easily access reverts using the mediawiki_history dataset now across all projects.

  • 7. Talk page features

What should we consider a "talk page feature"? Idea: anything we deem to be more a core action a contributor could take on a talk page (e.g. attempting to/successfully starting a new topic, watching the page, replying to a comment within an open discussion/topic, etc.).

In the current key results, we say "5% increase in the retention of junior contributors who use talk page features". I think this should be revised to contributors participating on talk pages instead, which would include the core actions of starting a discussion or replying to one.

Which features are we currently tracking the usage of?

We are currently only tracking any edits to a talk page. We can decipher between edit attempts to a main page vs a section using the EditAttempStepSchema. We do not track the difference between a post and a reply.

ppelberg added a comment.EditedOct 9 2019, 9:12 PM

Below are notes from the conversation @MNeisler and I had this morning...

1. Junior contributors
Notes

  • If we're not able to look at contributions on a per project basis, that would mean a contributor who has 200 edits to Commons and 2 edits to Wikipedia, would not be considered a junior contributor to wikipedia

Actions

2. Participating on talk page
Decided

  • We should monitor talk page usage only on Wikipedia projects since that will be the focus of the project.
  • We should define "participation" as starting a new discussion or replying to an existing one
    • Caveat: we do not currently have instrumentation in place to be able to differentiate between replies and original posts (starting a new discussion). Thus, for now, we will define "participation" as any published change.

3. Talk pages
Decided

  • For the purposes of calculating our baseline metrics, we will include all 16 talk page namespaces. As Megan put well, "We can always filter and do breakdowns in the analysis to compare metrics across different types of talk pages, which might provide some interesting insight into different talk page usage."

Actions

  • @ppelberg: Are we going to seek to improve talk experiences across all namespaces or should we limit our thinking to user and article pages, even if interventions are deployed more broadly?

4. Retention
Decided

  • For purpose of calculating baseline metrics, we will look at retention across all experience levels; the goal of the project. Although, our goal for this project is to increase retention of junior contributors.
  • We are defining activity as any edit to a talk page

Actions

5. Productive contributions to article page
Decided

  • For purposes of calculating baselines, we'll stick with reverted/non-reverted contributions
    • Rationale: ORES models currently deployed to 26 wikis and we're not sure whether those 26 wikis are sufficient proxies for behavior on all wikis. More info: T234046#5560053

7. Talk page features
Decided

  • @MNeisler, I agree with you about removing this definition and instead changing the language
    • FROM: "5% increase in the retention of junior contributors who use talk page features."
    • TO: "5% increase in the retention of junior contributors participating on talk pages."

Actions

  • @ppelberg: update language in appropriate places (e.g. Phab, Airtable, etc.)

1. Junior contributors
Notes

  • If we're not able to look at contributions on a per project basis, that would mean a contributor who has 200 edits to Commons and 2 edits to Wikipedia, would not be considered a junior contributor to wikipedia

Actions

  • @MNeisler: is it possible to differentiate contributions by project?

Yes. I double-checked this and confirmed that while the event user revision count field does track a user's edits across all projects, we can add filters to differentiate contributions by the project. Based on this, I think we can keep the definition as "Contributors who have made <100 cumulative contributions to Wikipedia"

ppelberg updated the task description. (Show Details)Oct 10 2019, 11:30 PM

1. Junior contributors
Notes

  • If we're not able to look at contributions on a per project basis, that would mean a contributor who has 200 edits to Commons and 2 edits to Wikipedia, would not be considered a junior contributor to wikipedia

Actions

  • @MNeisler: is it possible to differentiate contributions by project?

Yes. I double-checked this and confirmed that while the event user revision count field does track a user's edits across all projects, we can add filters to differentiate contributions by the project. Based on this, I think we can keep the definition as "Contributors who have made <100 cumulative contributions to Wikipedia"

Excellent. I've updated the task description to reflect this.

ppelberg updated the task description. (Show Details)Oct 10 2019, 11:43 PM

Below are notes from the conversation @MNeisler and I had this morning...
4. Retention
Decided

  • For purpose of calculating baseline metrics, we will look at retention across all experience levels; the goal of the project. Although, our goal for this project is to increase retention of junior contributors.
  • We are defining activity as any edit to a talk page

Actions

@MNeisler and I discussed this some more over chat.

We decided:

  • The retention period will start 24 hours after a contributor has been "activated."

Our thinking was shaped by:

  • Product Analytics uses 24 hours for their "cool down period" in calculating new editor retention
  • We do not have clear evidence – at this time – that suggests the "cool down time period" for talk pages should be different

I am updating the task description to reflect this thinking.

ppelberg updated the task description. (Show Details)Oct 16 2019, 12:15 AM
ppelberg updated the task description. (Show Details)Oct 17 2019, 7:19 PM

In our 17-Oct meeting, @MNeisler and I decided on the changes listed below and in doing so, finalized the draft of our metric definitions based on the information we currently have available.

Those definitions now live in this task's description and upon confirmation from the team, will be represented on MediaWiki here: https://www.mediawiki.org/wiki/Talk_pages_project/Glossary

17-Oct Decisions

1. Junior contributors

  • Registered contributors [1] who have made <100 cumulative contributions to Wikipedia

2. Participating on talk page

  • Successfully contributed to any Wikipedia talk page that results in a diff that is not reverted

8. Contributions to Wikipedia

  • Any action that results in a diff that is not reverted

9. Contribution to talk page

  • Any action that results in a diff that is not reverted

10. Senior contributors

  • Registered users who have more than 500 edits and accounts that are older than 30 days

  1. "Registered" = we have no way of tracking unregistered contributors.
ppelberg updated the task description. (Show Details)Oct 18 2019, 6:34 PM
ppelberg updated the task description. (Show Details)

Adding a few notes @kzimmerman raised in our conversation just now...

  • Be explicit about calling "Retention" "30-day retention" (I've updated the task description to reflect this)
  • Set a boundary on how long an edit needs to exist (read: not be reverted) for it to be considered "productive."
    • @Neil_P._Quinn_WMF, do you have a sense for what an appropriate time boundary should be? Kate mentioned you've put thought to this in other contexts in the past...
  • When we do our end of year analysis, make sure we look at impact across all experience levels (e.g. do not leave out the band of contributors that fall between "Senior" and "Junior" contributors)

cc @MNeisler

ppelberg updated the task description. (Show Details)Oct 18 2019, 10:58 PM
MNeisler moved this task from Doing to Tracking on the Product-Analytics board.
ppelberg added a comment.EditedOct 23 2019, 10:48 PM

@MNeisler and I talked about this during our meeting today and agreed these definitions are complete enough for us to resolve this task. Megan is going to follow up with Neil about the time boundary on edits being reverted.

One note: we decided to remove the "account age" criteria from the definition of "Senior contributor" to keep it consistent with the definition of "Junior Contributors" which does not include account age. This change is reflected in the task description.

ppelberg updated the task description. (Show Details)Oct 23 2019, 10:50 PM
Libcub added a subscriber: Libcub.Oct 29 2019, 4:57 AM

Several of the definitions use the clause "that results in a diff that is not reverted". It is important to remember that being reverted or not reverted is not a static attribute. What is not reverted today can be reverted tomorrow. Shouldn't the definitions include time in some fashion? Such as "that results in a diff that is not reverted within 30 days". Also, does it matter whether a revert itself was reverted--should that situation be considered reverted or not reverted?

Several of the definitions use the clause "that results in a diff that is not reverted". It is important to remember that being reverted or not reverted is not a static attribute. What is not reverted today can be reverted tomorrow. Shouldn't the definitions include time in some fashion? Such as "that results in a diff that is not reverted within 30 days".

This is a great point, @Libcub. Do you have an opinion about how long an edit should need to exist (read: not be reverted) for it to be considered "productive"? This is still an open question for us. See: T234046#5588284.

Also, does it matter whether a revert itself was reverted--should that situation be considered reverted or not reverted?

Hmm, how frequently have you seen/experienced this happening?

Jc86035 added a subscriber: Jc86035.Nov 5 2019, 6:39 PM

"Junior contributor" and "senior contributor" strike me as terms that will be confusing and/or have unintended implications (e.g. that users with 500+ edits are unambiguously regarded as having higher seniority), especially since this seems to be the first time they've been used in a MediaWiki context. While I think it does make sense to use the terms, it would probably be worth explicitly defining their meanings whenever they're used (or, alternately, replacing them with more common and/or less ambiguous terms like "experienced contributor").

DLynch added a subscriber: DLynch.Nov 6 2019, 7:01 PM

I'm interested in the ORES scores over reversion-rates as well, mostly because it lets us avoid all sorts of confusion about what counts/should-count as a revert, and side-effect like the mentioned "what if we overwhelm the people who'd be doing the reversions with a flood of content".

(Pure "this edit was reverted in its entirety" means we miss out on "this edit was bad, but someone fixed it without reverting it", or "part of this edit was bad, so someone manually reverted its changes to one section".)

I'm interested in the ORES scores over reversion-rates as well, mostly because it lets us avoid all sorts of confusion about what counts/should-count as a revert, and side-effect like the mentioned "what if we overwhelm the people who'd be doing the reversions with a flood of content".

(Pure "this edit was reverted in its entirety" means we miss out on "this edit was bad, but someone fixed it without reverting it", or "part of this edit was bad, so someone manually reverted its changes to one section".)

@DLynch, are you thinking we'd use ORES scores as the exclusive measure of "productive" contributions to Wikipedia article pages and talk pages? We'd use ORES scores a companion metric to revert rates? Something else?

For the purpose of this task, we were thinking through the frame of: "What is a reliable and relatively accessible metric we can use to evaluate contribution quality?"

With the below [1] in mind, we arrived at revert rate.

Although, in thinking about this again, maybe [in parallel] it would be worthwhile to consider ORES in our analyses, but not depend on it exclusively and in doing so – big assumption coming – inch towards what productive contributions in talk namespaces are.


The ORES edit quality model is likely a better predictor than non-reverted edits and would be interesting to explore for this project. There are a couple potential issues:
(1) Models have to be built individually for each wiki, so you won't have global coverage. It looks like, currently, only 26 Wikipedias have the damaging/good-faith models that really improve on looking at reverts (https://tools.wmflabs.org/ores-support-checklist/)
(2) Data access to the ORES scores would take more work/time (but is feasible especially if we are looking at a smaller list of edits such as with the mobile VE experiment ). We can easily access reverts using the mediawiki_history dataset now across all projects.

"Junior contributor" and "senior contributor" strike me as terms that will be confusing and/or have unintended implications (e.g. that users with 500+ edits are unambiguously regarded as having higher seniority), especially since this seems to be the first time they've been used in a MediaWiki context. While I think it does make sense to use the terms, it would probably be worth explicitly defining their meanings whenever they're used (or, alternately, replacing them with more common and/or less ambiguous terms like "experienced contributor").

We appreciate you calling this out. To account for the valid concern you're raising, our plan has been to:

If there are other things you think we can do to make these terms clear, we'd be keen to hear.

Metric definitions have been posted on-wiki. See: https://www.mediawiki.org/w/index.php?title=Talk_pages_project%2FGlossary&type=revision&diff=3504986&oldid=3463609

cc @Jc86035

Leaving this ticket open for now, to finish the discussion started in T234046#5641799.

ppelberg updated the task description. (Show Details)Nov 11 2019, 8:12 PM
ppelberg updated the task description. (Show Details)Nov 14 2019, 12:45 AM

Task description update:

  • Clarifying definition of retention to include explicit mention of "cool down" period. This change has also been made to the Talk pages project/Glossary.
Alsee added a subscriber: Alsee.Nov 17 2019, 8:23 PM

I edited the wikipage item for "Junior Contributors" to add a criteria of "at least one article page edit". You may want to revise this to "at least one article page edit that has not been reverted".

The history of the Article Feedback Tool should make it clear that talk-comments with zero-article-edits is not considered contribution. Chatter by non-editors has negative value on average.

I strongly suggest that the team add a separate metric to track talk-comments-with-zero-article-edits. I expect many or comments like that would be as IP. The team should track the percentage IPs that comment on talk without article-editing.

See the research finding that Wikia's attempt to deploy an easier talk system resulted in decreased article contributions by new users. The community would consider that very bad. If the new interface were to negatively impact our inflow of new users successfully joining us as active article editors, I expect the community would seek to modify or disable the interface to protect the ongoing health and successes of the project. If the new interface were to increase commenting by non-editors, without a compelling increase in article-contributors, I expect the community would seek to modify or disable the interface due to the costs and disruption of non-editors.

I edited the wikipage item for "Junior Contributors" to add a criteria of "at least one article page edit". You may want to revise this to "at least one article page edit that has not been reverted".

The history of the Article Feedback Tool should make it clear that talk-comments with zero-article-edits is not considered contribution. Chatter by non-editors has negative value on average.

I strongly suggest that the team add a separate metric to track talk-comments-with-zero-article-edits. I expect many or comments like that would be as IP. The team should track the percentage IPs that comment on talk without article-editing.

See the research finding that Wikia's attempt to deploy an easier talk system resulted in decreased article contributions by new users. The community would consider that very bad. If the new interface were to negatively impact our inflow of new users successfully joining us as active article editors, I expect the community would seek to modify or disable the interface to protect the ongoing health and successes of the project. If the new interface were to increase commenting by non-editors, without a compelling increase in article-contributors, I expect the community would seek to modify or disable the interface due to the costs and disruption of non-editors.

Thank you for the interesting suggestions I've been advising the analyst who is working on this project and we will definitely discuss these points and give a fuller response here once we've considered it a bit.

However, I have reverted your edit to the documentation page. That page doesn't dictate our metrics; it just documents them, and since we haven't changed our plans in response to your comments (at least not yet), your edit made it inaccurate. I'm sure you would not want to give other volunteers concerned about the same issues the false sense that we have already addressed them!

ppelberg closed this task as Resolved.Nov 23 2019, 2:17 AM

+1 @Neil_P._Quinn_WMF, thank you for your comment, @Alsee.

I am resolving this task considering the initial metrics have been defined and posted on wiki. [1]

Although, I've created T238971 to be the place where, as Neil mentions, we can share a response to the suggestion mentioned in T234046#5669844 once we've considered it fully.


  1. https://www.mediawiki.org/wiki/Talk_pages_project/Glossary