Page MenuHomePhabricator

Implement functionality for RC page 'Contribution Quality' filters (ORES)
Closed, ResolvedPublic

Description

The Contribution Quality filters are based on the ORES damaging test. By providing predictions about which edits are likely to "have problems" and which are likely to be "good," they enable reviewers to better target their work. Based on the explorations in T146333, we've settled on four filters Quality options (view in prototype). .

In creating these four levels, we strove to balance users' desires for accuracy versus breadth of coverage. Below see the ORES Damaging score range for each filter (in square brackets, subject to finalization in T149761). For final filter names and description texts, see T149385.

  • Very likely good [0%-55%]
  • May Have problems [16%-100%]
  • Likely Have Problems [75%-100%]
  • Very Likely Have Problems [94%-100%]

Notes about the Quality filters

  • These filter ranges will, at least initially, be identical with those built into the ReviewStream feed.
  • Because the "Problem" filters overlap, they are subject to the behavior stated in T149391, under 'No-Effect Display States", and T149452, under "'Excluded Display States."

About the functionality of all new RC page filters generally

  • Like all filters in the enhanced RC page filter UI, these filters conform to a set of rules that are, in some ways, very different from the existing RC page filters. The existing filters are designed primarily to EXCLUDE selected properties. The new filters are intended to INCLUDE those properties; logically, the filters within each new group constitute a set of OR filters (each group of ORs being connected to other groups by ANDS). So, the filters in this group follow these rules:
    • To INCLUDE property A, users check the box for property A.
    • To EXCLUDE property A, the user must uncheck A and check it's complements, properties B, C and D.
    • If NONE of A, B, C or Dare checked, then ALL are included.
    • If ALL of A, B, C and D are checked, then the result is the same: ALL are included.
  • As per T146076, searches on the RC page are meant to be bookmarkable. Please make sure your search adds query strings to the URL.

Related Objects

Event Timeline

Change 324507 had a related patch set uploaded (by Sbisson):
[WIP] 'damaging' filter

https://gerrit.wikimedia.org/r/324507

As I work on this I'm wondering what should be done about the existing ORES review tool features. We stated that the new filters on Special:RC is the new ORES review tool and it lives in the same beta feature. I would like confirmation that it's indeed the decision and know if there's any exception to that rule (parts of the existing ORES review tool that we would keep, integrated or in parallel).

AFAICT, the ORES review tool contains the following components:

  • Displays a red "r" next to changes that are considered damaging
  • Legend entry for the red "r"
  • User preference defining the threshold at which an edit is considered damaging
  • The 'hidenondamaging' filter (also hides patrolled edits as it considers that they are probably not damaging)
  • Automatic highlighting of the damaging changes using colors associated with predefined thresholds
  • All those things on Special:RC, Special:Watchlist, Special:Contributions

Thanks for asking Stephane. Re. the Watchlist and Special:Contributions, there should be no change at this time.

Re. RC page, my guess is that all of the items below should be turned off, since they would to some extent conflict with the new interface, but I want to hear from @Pginer-WMF and @Halfak.

  • Displays a red "r" next to changes that are considered damaging
  • Legend entry for the red "r"
  • User preference defining the threshold at which an edit is considered damaging
  • The 'hidenondamaging' filter (also hides patrolled edits as it considers that they are probably not damaging)
  • Automatic highlighting of the damaging changes using colors associated with predefined thresholds

It does occur to me that the red "r" could be retained if we thought it was useful, though I'd want to bring it in line with the new filter definitions. That means:

  • Make it so the "r" is applied to all edits that "Very likely have problems." And
  • Change the legend on this page so that it uses the same language.

@Ladsgroup, see above.

My sense is that dropping the red "r" from both the list and the legend makes sense with the new highlighting. However since there's no advanced filters for Special:Contributions or Special:Watchlist, that functionality should be left alone.

It seems that "hidenondamaging" is now redundant.

I also think we should keep the scope of the changes only to Recent Changes page. Otherwise, users that were able to get some support with the ORES beta feature in their watchlist they'll find that it is removed and replaced with nothing. Regarding the Recent Changes page, I'm also in favour of removing the "r" mark and the note about it in the legend.

So we have consensus. Functionality on Watchlist and Special:Contributions will remain as now.

Meanwhile, all the ORES functions mentioned above will be removed/turned off for Special:RecentChanges.

Please raise this on the on-wiki discussion too. https://www.mediawiki.org/wiki/Topic:Tflhjj5x1numzg67

If we don't get any substantial response there, then I think we can declare consensus.

Change 324507 merged by jenkins-bot:
'damaging' filter on Special:RC / Special:Watchlist

https://gerrit.wikimedia.org/r/324507

@SBisson - please review the following results. Any suggestions for testing for for adding additional cases are welcome.

Checked betalabs edits that are ORES-marked as problematic.

(1) Barak Obama article displays consistency between old ORES highlighting and the filters selection.

EditsORES markIs the edit displayed when a filter apply?
rev_id = 349205Pale yellow; oresc_probability: 0.500likelygood: Yes; maybebad:Yes; likelybad:No; verylikelybad:No
rev_id = 349206Pale yellow; oresc_probability: 0.600likelygood: No; maybebad:Yes; likelybad:No; verylikelybad:No
rev_id = 349208Medium yellow; oresc_probability: 0.800likelygood: No; maybebad:No; likelybad:Yes; verylikelybad:No

(2) New filters seem to be scoring new pages as potential damaging, whereas ORES does not highlight them
at all. Example - NewPage1481556292

EditsORES markIs the edit displayed when a filter apply?
rev_id=349169not marked; oresc_probability: 0.960likelygood: No; maybebad:Yes; likelybad:Yes; verylikelybad:Yes

(3) A little discrepancy for edits on World article - two things are here 1) mostly a deficiency of old ORES highlighting - medium yellow for oresc_probability: 0.530 2) I am not too sure how it looks from the user point of view to see a certain edit simultaneously in two categories - 'likelygood' and 'maybe bad'

EditsORES markIs the edit displayed when a filter apply?
rev_id=349235Medium yellow; oresc_probability: 0.530likelygood: Yes; maybebad:Yes; likelybad:No; verylikelybad:No

(4) I did not find any obvious discrepancies in filtered results: nor when combining filters, neither when using them in different order, nor that one edit is present in 'likelygood' and 'verylikelybad'.

  • After chat with @SBisson: the following needs an additional look

(2) New filters seem to be scoring new pages as potential damaging, whereas ORES does not highlight them.

  • generally, there is a consistency between old ORES highlighting and applying new, damaging filter options
  • overlapping filters cases, as mentioned in (3), need to be reviewed/researched; will be done in a different ticket
  • all test cases/results are documented, since the similar testing needs to be done after UI will be in place.
  • filter options selection is bookmarkable

QA recommendation: Resolve.

@Etonkovidova writes:

I am not too sure how it looks from the user point of view to see a certain edit simultaneously in two categories - 'likelygood' and 'maybe bad'

You're presumably imagining a case where the user has highlighted both filters. I see your point. It will look odd. But I'm not too worried. First, I don't know how common it will be for a user to simultaneously search for good and bad edits, though it's certainly possible. But more generally, a reviewer using highlighting will quickly learn that an edit can be highlighted by multiple filters. More commonly, the overlap will be more obvious, as in "likely" and "very likely." But my guess, and hope, is that someone getting the result you mention will make the inference that the edit in question is "iffy" for both categories. And that supposition would be right, since the intersection where this can happen is at the less certain ends of both the 'problem' and 'good' zones. So, in that sense, they're actually getting useful data (though it might not seem so at first). @Pginer-WMF, do you see a need to address this? I should note that for filtering, that case is already included among the '"No-Effect Display States".

all test cases/results are documented, since the similar testing needs to be done after UI will be in place.

If we need to test this all again when the interface is connected (and when the filters are looking at real data, I hope?), then should we close this ticket and create a new one specifically about testing when the interface is ready? Or just leave it? What do you recommend Elena?

@jmatazzoni
My concern was for 'iffy' edits that are simultaneously in 'likelygood' and 'maybebad' filters , not for edits that are in 'likelybad' and 'verylikelybad' filters. For newcomers who are interested to track down their edits, it might be quite discouraging and rather confusing. I agree that such cases might be rare.

Yes, the ticket may be closed. Testing for UI is different from the functional testing of separate filters, so testing Contribution Quality filters will be just a part of it. When UI will be implemented, I will test the phab tasks such as T149391: Build user interface for Active Filter Display Area and T149452: Build user interface for the Dropdown Filter Panel etc. The issues that might be found will be filed separately.

My concern was for 'iffy' edits that are simultaneously in 'likelygood' and 'maybebad' filters

Yes, that's what I'm referring to as well. If an edit is in both of these categories, then it is correct to infer that the association is "iffy" for both. The system, though imperfectly, is telling the user just that.