The work in this task and in T195796 make up the third useful feature change that we could roll out to users. The work in the task is part of accomplishing these user stories:
* As a reviewer, I need to be able to filter by the four categories in the ORES `draftquality` model (vandalism, spam, attack, ok).
* As a reviewer, I need to be able to filter by the six categories in the ORES `wp10` model (Stub, Start, C-class, B-class, Good, Featured).
Specifically, the work is to build on T195796 by allowing users of the New Pages Feed to filter pages based on ORES scores:
* Filters will need to be added to the New Pages Feed to allow selection on these scores. The same filters will need to be added to the filter menu for both the NPP and AfC use cases. There is a conversation with the reviewing community on whether it would be better to allow reviewers to filter the New Pages Feed using the specific categories produced by the two models (like in [[ https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#/media/File:AfC_menus_concept_A_2018-05-17.png | Concept A ]]) or to roll up those categories into less granular shortcuts (like in [[ https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#/media/File:AfC_menus_concept_B_2018-05-17.png | Concept B ]]). A decision will be made on that during the week of May 28, and this Phabricator task will be modified accordingly. For now, we should plan on being able to allow selections based on the more granular model categories like in Concept A. In this situation, the user interface should refer to the `draftquality` model as "Predicted issues" and `wp10` as "Predicted class". The values should be shown in the interface as two sets of checkboxes, in these orderings:
*** Predicted issues
**** None (note that this corresponds to the model value "OK")
*** Predicted class
** When adding these filters to the menu for NPP, all existing filter options in that menu should remain unchanged ("Show", "In namespace", and "That" -- see wireframes below for more clarity).
** When model categories are selected, the selected categories should be displayed list's header, where the header right now says things like "Showing: unreviewed, blocked users". We want to use @alexhollender's recommended design, which is to take into account the various possibilities for ANDs and ORs by doing it like this: "Active filters: State (Awaiting review), Quality (B-class, C-class), Classification (attack), Copyvio (below 50)". See the attached mockup.
** Although the ORES API by default returns a specific category (e.g. "spam") in addition to numeric scores for spam and the other categories, it is simply choosing the highest score of the various categories and choosing that category as the winner. The Scoring team recommends that we build in the ability to adjust the score cutoff for our chosen category, which will allow us to tune the usage of the models according to reviewer preferences. The Collaboration team gave themselves this ability to adjust cutoffs with the Recent Changes feed (note: this is about our ability to adjust cutoffs on the software side, not about giving reviewers the ability to set their own cutoffs). Here is what we have decided:
*** For the initial implementation, it will be sufficient to use the naive response from the ORES API of whatever class has the highest score.
*** When we do want to use cutoffs, here (for future reference), is the API endpoint for interrogating the stats of choosing different cutoffs (in this example, showing all the thresholds for the "spam" class of `draftquality`): https://ores.wikimedia.org/v3/scores/enwiki/?models=draftquality&model_info=statistics.thresholds.spam
Two other notes:
* [[ https://en.wikipedia.org/wiki/User_talk:SQL | User:SQL ]] made a page that scores the two ORES models on all submitted drafts each day. Perhaps there are some things we can learn from that user's implementation: https://en.wikipedia.org/wiki/User:SQL/AFC-Ores.
* It will be great if we can sanity check our scores before integrating them into the software. As we're working on this development, it would be good to be able to export lists of scored pages so that humans can look them over and make sure the scores and cutoffs make sense.
Note: the specifics listed above and the wireframes shown below may be changed by ongoing community conversation around the design, [[ https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018 | which can be found here ]].
Here are wireframes of what the feed would look like after this work for both the NPP and AfC cases, showing the work from T195545, T195924, and T195547, and now with the ORES models in the filter menu (note that these wireframes do not show many of the details that should remain unchanged, like the info listed with each page in the list):