The team working on Edit-Review-Improvements is hoping to support a process by which patrollers look for damaging edits by good-faith new editors.
Currently, they are hoping to find edits that are "likely" to be damaging (operationalized as recall_at_precision(min_precision=0.6) == 0.879) and "very likely" to be goodfaith (operationalized as recall_at_precision(min_precision=0.995) == 0.86), but they aren't finding any in practice.
This task is done when we explore what the implications of these operationalizations. Is it a failure of the prediction models that we can't find these edits? Or is it an improper operationalization?