Visually report damaging confidence
Closed, ResolvedPublic

Description

From https://www.mediawiki.org/wiki/Topic:Tb34n8tdmpv4vc38

the red in my watchlist screams "red alert!" big problem! Lots of likilhood of it being terrible" Wheras my experience so far has been more in a warmer colour (orange, or something), where the change is of need for attention but not screaming at me.

We should set 3 thresholds for color:

  • filter_rate_at_recall(min_recall=0.9): yellow (review for completeness)
  • filter_rate_at_recall(min_recall=0.75): orange (likely to be damaging)
  • recall_at_fpr(max_fpr=0.1): red (almost certainly damaging)

In the case of English Wikipedia's damaging model, this would set the thresholds to (20%, 46%, 94%).

It would be great if we also had some sort of tooltip that read the exact prediction probability like the ScoredRevisions tool. E.g. "85% damaging, 23% goodfaith"

Result:

Halfak created this task.Sep 7 2016, 2:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 7 2016, 2:14 PM
Halfak updated the task description. (Show Details)Sep 7 2016, 2:16 PM

@Pginer-WMF is exploring some design concepts for representing confidence in T138935. In this case, he's looking to include a flag widget that would express the confidence. So the simple "r" would be replaced with something that like: [Damaging ••○]

Sadads added a subscriber: Sadads.Sep 7 2016, 3:04 PM

Thanks for making this a phabricator item! Looking forward to the update on how this works!

He7d3r added a subscriber: He7d3r.Sep 7 2016, 6:28 PM
Halfak triaged this task as High priority.Sep 8 2016, 2:43 PM
Halfak moved this task from Backlog to New development on the Scoring-platform-team board.
Halfak moved this task from Backlog to Prioritized on the MediaWiki-extensions-ORES board.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptSep 14 2016, 4:57 PM

T143611: Embed machine readable ores scores as data on pages where ORES scores things is not deployed yet but once it's deployed we can simply run fun javascripts to make recent changes more colorful. For example I wrote this:

1/**
2 * Created by amir on 10/23/16.
3 */
4( function ( mw, $ ) {
5'use strict';
6var colors = {
70.40: '#750787',
80.50: '#004DFF',
90.60: '#008026',
100.70: '#FFED00',
110.80: '#FF8C00',
120.90: '#E40303'
13}
14$('li').each( function (){
15if ( $( this ).children('a').attr('href') ) {
16var reg = /diff=(\d+)/ig
17var res = reg.exec( $(this).children('a').attr('href') );
18if (res && res[1] in mw.config.get('oresData')) {
19var score = mw.config.get('oresData')[res[1]]['damaging'];
20var threshold = 0;
21for ( threshold in colors ) {
22if ( score > threshold ) {
23$( this ).css( 'background-color', colors[threshold]);
24}
25}
26}
27}
28} )
29}( mediaWiki, jQuery ) );

Which made this:

(Rainbows!)
Now we should talk about how we can use this in the extension.

Note, we talked about this in the revscoring meeting and we determined that three thresholds should be surfaced through a config variable from https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php

The new threshold will come from recall_at_fpr(max_fpr=0.1).

In the recent iteration on the exploration we are doing to integrate ORES filters and others into Recent Changes (T147632), a flexible highlighting mechanism is provided. Users can define what to highlight and the colors to use. The colors allow for the three common colors used for warnings (red, orange and yellow) and the filters provided allow to target three filtering levels based on different precision/recall values for damaging edits. We also used the list bullet points to clarify the cases where more than one highlighting criteria applies to a given row.

On the testing sessions, the system seems to work well and users appreciate the metaphor of highlighting as they would do in a paper-base document so far. But more details will be shared as @dchen completes the study.

Currently we were not adding any highlight by default, but we can consider doing so if we consider any to be generally relevant for all users of Recent Changes. In any case, filtering settings are expected to be reflected in the URL so it is possible to direct users to the Recent Changes page with a specific set of filters and highlights form a context where those are expected.

Change 318774 had a related patch set uploaded (by Ladsgroup):
Expose ORES damaging thresholds in javascript

https://gerrit.wikimedia.org/r/318774

Re: the colors, it would be good to have some defaults or recommendations, for color schemes that are accessibility-friendly.
I took a quick attempt, with the filters at http://colorbrewer2.org/#type=diverging&scheme=RdYlBu&n=6 (+ checkmark "colorblind safe" and increase the transparency so that text overlaid is highly legible) and produced:
#d73027
#fc8d59
#fee090
#e0f3f8
#91bfdb
#4575b4

Pau or Volker might have better suggestions on how to find a broader selection - (that map tool only offers 6 colors, if the "colorblind friendly" setting is checked).

Ladsgroup moved this task from Review to Active on the Scoring-platform-team (Current) board.

Change 318774 merged by jenkins-bot:
Expose ORES damaging thresholds in javascript

https://gerrit.wikimedia.org/r/318774

Change 320341 had a related patch set uploaded (by Ladsgroup):
Visually report damaging confidence

https://gerrit.wikimedia.org/r/320341

Ladsgroup moved this task from Active to Review on the Scoring-platform-team (Current) board.
Volker_E added a subscriber: Volker_E.
Ladsgroup moved this task from Review to Done on the Scoring-platform-team (Current) board.

Change 320341 merged by jenkins-bot:
Visually report damaging confidence

https://gerrit.wikimedia.org/r/320341

It now looks like this:

Ladsgroup updated the task description. (Show Details)Nov 26 2016, 7:53 PM
Halfak closed this task as Resolved.Nov 30 2016, 9:05 PM