Soon we'll switch to ORES-based topic filtering, which in itself is not too much of a change (just a slight difference in configuration format and in what search string to generate), but we'll probably want to keep the ability to either search method instead of just replacing the morelike search code with the ORES one (because that will allow us to roll out gradually across wikis, keep morelike as a fallback, run an A/B test etc). So we'll probably want some sort of "search strategy" abstraction in the backend.
|Resolved||• Rileych||T240517 [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics)|
|Resolved||• Tgr||T243477 Newcomer tasks: update topic task suggestion backend to handle multiple topic search methods|
@MMiller_WMF one relevant question is, will we end up with the same topic list we use currently? If we do, we can just reuse the same on-wiki configuration pages and add the ORES config as another field next to the morelike config. If it's going to be a different set of topics, we'll probably want to use a new configuration page, and then we'll need some changes to the configuration loading logic too.
@Tgr -- we will have different topic lists than the ones we are using for morelike. The ORES models are built with a different ontology that we think is better than the old morelike one. Two points to think about as you work on this:
- The scores from the new ontology will need to combined or rolled-up in certain ways that we are still determining. For instance, in order for an article to be "Science" in the UI, it means it has to have a high score for "Chemistry", "Physics", or "Biology".
- The ontology will likely evolve in the future. This won't be frequent, but we expect it to happen. Topics may get reorganized, added, or subtracted. Will we be able to handle that gracefully?
I created T244192: Newcomer tasks: ORES ontology mapping and score thresholds to sort out exactly how we will roll up and use the ontology. To what extent are you blocked on the decisions in that task?
The straightforward approach would be to define the roll-up in the per-wiki JSON config page, so all of those pages would have to be edited when such a change happens. Does that still fit into "gracefully"? Or do we want a single cross-wiki location for defining which ORES topics combine into a given suggested edit topic?
To what extent are you blocked on the decisions in that task?
The question in T244192#5858481 affects configuration file format and search code a little bit, but it's a relatively trivial change that can be done separately. So not really blocked.
In hindsight the code changes here were probably not strictly necessary; they did make the code nicer though.
Will still need another patch for the configuration loading changes once we have the config format figured out.
Moving back into development for the ORES strategy (it's probably blocked on T240559: Expose ORES drafttopic data in ElasticSearch via a custom CirrusSearch keyword and T243359: Define configuration for ORES articletopic search but we don't have a blocked column).
The commit summary of the last patch says "Sorting the topics (including making use of the 'groups' field) is left to another patch." which I then forgot about... Filed now as T246061: Newcomer tasks: Sort topics alphabetically.