This is the second cohesive, useful feature change that we could roll out to users. The work in the task roughly accomplishes these user stories:
* As a reviewer, I need to be able to filter by the four categories in the ORES `draftquality` model (vandalism, spam, attack, ok).
* As a reviewer, I need to be able to filter by the six categories in the ORES `wp10` model (Stub, Start, C-class, B-class, Good, Featured).
Specifically, the work is:
* **Generating scores**
** For all pages in the New Pages Feed, including the Article, Draft, and User namespaces, generate scores from both the `draftquality` and `wp10` models [[ https://www.mediawiki.org/wiki/ORES | described on the ORES page ]]. It is unlikely that we use the scores from the User namespace for anything.
** The `draftquality` model returns six scores and the `wp10` model returns four scores. I recommend that we store all ten of those scores, and separately apply logic to determine which categories to display for those two models, as described in the section below. This may give us needed flexibility to change our logic later on.
** In terms of when to score the models, we have some flexibility. Ideally, we would be able to score pages upon their first appearance in the New Pages Feed, and rescore them on each successive edit as long as they are part of the feed. Rescoring the models is more important for NPP work than AfC work, as new articles do tend to be edited soon after they are first created, whereas new drafts submitted to AfC do not. To rescore on every edit, it sounds like there are two options that @Halfak was telling me about, and it would be great if he could comment with his recommendations:
*** New Pages Feed would use the ORES API to query for new model scores on every edit.
*** The Scoring team could add the `draftquality` and `wp10` models to the set of models that are already being rescored on every edit, storing the scores in the Mediawiki database for uses like the New Pages Feed.
** If we decide for technical reasons that rescoring models with every edit is not a good idea, these are some potential alternative business rules that the team can discuss:
*** Rescore models once a day (or other time period) on pages that have been changed in the previous day (or other time period).
*** Rescore models on a given page after a certain number of edits
*** Rescore models on a given page after a certain number of bytes changed.
* **Displaying scores**
** Filters will be added to the New Pages Feed to allow selection on these scores. The same filters will need to be added to the filter menu for both the NPP and AfC use cases. There is a conversation with the reviewing community on whether it would be better to allow reviewers to filter the New Pages Feed using the specific categories produced by the two models (like in [[ https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#/media/File:AfC_menus_concept_A_2018-05-17.png | Concept A ]]) or to roll up those categories into less granular shortcuts (like in [[ https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#/media/File:AfC_menus_concept_B_2018-05-17.png | Concept B ]]). A decision will be made on that during the week of May 28, and this Phabricator task will be modified accordingly. For now, we should plan on being able to allow selections based on the more granular model categories like in Concept A. In this situation, the user interface should refer to the `draftquality` model as "Predicted issues" and `wp10` as "Predicted class". The values should be shown in the interface as two sets of checkboxes, in these orderings:
*** Predicted issues
**** None (note that this corresponds to the model value "OK")
**** Spam
**** Vandalism
**** Attack
*** Predicted class
**** Stub
**** Start
**** C-class
**** B-class
**** Good
**** Featured
** When adding these filters to the menu for NPP, all existing filter options in that menu should remain unchanged ("Show", "In namespace", and "That" -- see wireframes below for more clarity).
** When model categories are selected, the selected categories should be listed next to the word "Showing" in the list's header. It would be great if @alexhollender could weigh in here on the logic of how to list the selected categories, taking into account the various possibilities for ANDs and ORs. This will be determined during the week of May 28.
** Although the ORES API by default returns a specific category (e.g. "spam") in addition to numeric scores for spam and the other categories, it is simply choosing the highest score of the various categories and choosing that category as the winner. The Scoring team recommends that we build in the ability to adjust the score cutoff for our chosen category, which will allow us to tune the usage of the models according to reviewer preferences. The Collaboration team gave themselves this ability to adjust cutoffs with the Recent Changes feed (note: this is about our ability to adjust cutoffs on the software side, not about giving reviewers the ability to set their own cutoffs). More information will be added to this Phabricator task during the week of May 28 about how best to do this.
Two other notes:
* [[ https://en.wikipedia.org/wiki/User_talk:SQL | User:SQL ]] made a page that scores the two ORES models on all submitted drafts each day. Perhaps there are some things we can learn from that user's implementation: https://en.wikipedia.org/wiki/User:SQL/AFC-Ores.
* It will be great if we can sanity check our scores before integrating them into the software. As we're working on this development, it would be good to be able to export lists of scored pages so that humans can look them over and make sure the scores and cutoffs make sense.
Note: the specifics listed above and the wireframe shown below may be changed by ongoing community conversation around the design, [[ https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018 | which can be found here ]].
Here are wireframe of what the feed would look like after this work for both the NPP and AfC cases, showing the work from T195545 and T195547, and now with the ORES models in the filter menu (note that this wireframe does not show many of the details that should remain unchanged, like the info listed with each page in the list):
{F18584568}
{F18584570}