This is an expansion of work done in T240558: Update ORES articletopic data score in ElasticSearch when an article gets edited (which ended up being more specific to a Growth experiment) to also include the drafttopic predictions for the Draft namespace.
Background:
ORES supports two topic models.
- articletopic - Trained and tested against full articles. Designed to be scored against the most recent version of a full article
- draftttopic - Trained and tested against initial versions of articles. Designed to be scored against articles that are still early in their development (AKA drafts)
I think we'll want to enable the drafttopic model for all pages in the Draft namespace on English Wikipedia and any other wikis that have such a namespace.
- We'll need to do new threshold optimizations for the drafttopic model to choose good thresholds. They are slightly different than the articletopic model. @Halfak has a script for that.
- We'll need to gather predictions from HDFS. ORES already produces drafttopic predictions for changes to pages in the draft namespace and the first edit to pages in the article namespace.