It's our goal to be able to communicate ORES Good-Faith and Damaging predictions to users in a way that is simple to understand and corresponds with users' goals and mental models. The research envisioned will guide the creation both of a conceptual system that will be incorporated into various feeds and a visual system used on a new special page and the Recent Changes page.
Questions to be answered include :
- What conceptual scheme for simplifying ORES good-faith and damaging scores makes sense to users and corresponds with their goals?
- How many discrete levels can we usefully divide the spectrum of scores into?
- What language will help users understand these features and their purpose?
- What visual representations communicate these ideas most immediately to users?
As we consider the appropriate number of discrete levels into which to divide ORES scores and the correct points at which to set thresholds, there will be a need to find the correct balance between (to use the machine-language terms) precision and recall. Which is to say, what system will help us balance users' desires to, on the one hand, find all possible relevant items versus, on the other, to not be distracted by false positives?
It's expected that the systems we create will be used in a wide variety of ways and in disparate contexts/locations, including the Recent Changes page, a planned special page designed for new user support, on Watchlists, in programs like Huggle, etc. For this reason, it seems desirable to test a variety of users pursuing distinct goals, including, at a minimum, new-user support and vandalism fighting.
A note about testing methodology and the prototype we'll likely need: Since this project examines the scoring system and how users understand it, to find out what works we'll need to give users the opportunity to examine a dataset of real edits with real ORES scores, diffs, etc. Only in this way will we know if people find what we propose useful.