Page MenuHomePhabricator

New Pages Feed: score draftquality on most recent revision
Closed, ResolvedPublic

Description

In T195796, @Ladsgroup helped us understand how the draftquality model is being scored on pages. The current business rule is that draftquality is only scored on the first revision of a page:

@Ladsgroup: What do you think about storing the draftquality score for the most recent revision rather than the first revision? This would help on a number of fronts.

I thought about it and it's not hard to treat it like wp10 model but we end up with lots of useless scores in the database for articles that have been created ages ago but got recently edited (like article of "Barack Obama") which doesn't make sense. I can't find any not-expensive way to prevent that now but ideas are more than welcome. FWIW, Some analysis showed the scores are not very different between the first and the latest version.

For the New Pages Feed improvements work, we've decided that we prefer for the model to be scored on the latest revision of the page. That's because when pages are new, they can be improved quickly, either by the original author or a community collaborator. We want those latest changes to be reflected in model scores so that reviewers continue to prioritize review on drafts that need attention soonest.

Therefore, the user story for this task is:

  • As a reviewer, I need the New Pages Feed to reflect the latest draftquality score for each page in the feed.

That said, we don't want to be scoring and storing scores for many irrelevant pages. The engineer who takes this task will first have to figure out an approach that will work. @Catrope recommended a hybrid approach in which we use the ORES API to score new revisions of pages that are in the PageTriage queue.

Event Timeline

MMiller_WMF added a subscriber: SBisson.

@SBisson -- FYI that this task is now created and in the "To Do" column of the sprint board.

No, this patch is to ease filtering in the PageTriage extension.

For this task we talked about having the PageTriage extension submit new FetchScoreJobs to score and save draftquality when an article it is monitoring gets a new revision.

This solution would retain the scores for all revisions, not just first and last like we hoped. Configuring cleanParent: true would clean the first revision as well. I don't see a good way to identifiy the first revision of a page other than storing it in pagetriage_page.

Change 446892 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/PageTriage@master] Score latest revision on 'draftquality'

https://gerrit.wikimedia.org/r/446892

Change 449274 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/ORES@master] [POC] Introducing a hook to decide what to score

https://gerrit.wikimedia.org/r/449274

Change 449275 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/PageTriage@master] [POC] Let ORES score 'draftquality' for ne revs of pages in queue

https://gerrit.wikimedia.org/r/449275

Change 446892 abandoned by Sbisson:
Score latest revision on 'draftquality'

https://gerrit.wikimedia.org/r/446892

Change 449437 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/mediawiki-config@master] CleanupParent for draftquality model when PageTriage is used

https://gerrit.wikimedia.org/r/449437

Change 449437 merged by jenkins-bot:
[operations/mediawiki-config@master] CleanupParent for draftquality model when PageTriage is used

https://gerrit.wikimedia.org/r/449437

Mentioned in SAL (#wikimedia-operations) [2018-08-01T11:54:17Z] <zfilipin@deploy1001> Synchronized wmf-config/: SWAT: [[gerrit:449437|CleanupParent for draftquality model when PageTriage is used (T199357)]] (duration: 00m 56s)

Change 449274 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Introducing ORESCheckModels hook

https://gerrit.wikimedia.org/r/449274

Change 449275 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Score 'draftquality' and 'wp10' for new revs of pages in queue

https://gerrit.wikimedia.org/r/449275

@MMiller_WMF I verified that only the latest revision scores for Predicted issues (draftquality) and on Predicted class (wp10) will be stored.
Two minor notices:

(1) damaging and goodfaith results are stored for all revisions. Since undo revision creates a new entry in the revision table and triggers new ORES re-evaluation, there probably no point of keeping ORES scores for revisions.

(2) 'Undo' revision action will not return to the previous revision scores - the 'undo' action creates a new revision which will be re-evaluated again. The same simple mechanism is in place - any new revision in the revision table will have new scores in ores table. I did not notice significant discrepancies in scores for undid revision though, but something to keep in mind if we would have complaints about differently scored revisions that were un-done.

--dry-run for extensions/ORES/maintenance/BackfillPageTriageQueue.php looks fine.

Change 452850 had a related patch set uploaded (by Catrope; owner: Sbisson):
[mediawiki/extensions/PageTriage@wmf/1.32.0-wmf.16] Score 'draftquality' and 'wp10' for new revs of pages in queue

https://gerrit.wikimedia.org/r/452850

Change 452850 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@wmf/1.32.0-wmf.16] Score 'draftquality' and 'wp10' for new revs of pages in queue

https://gerrit.wikimedia.org/r/452850

Mentioned in SAL (#wikimedia-operations) [2018-08-15T00:12:14Z] <catrope@deploy1001> Synchronized php-1.32.0-wmf.16/extensions/PageTriage/: SWAT: PageTriage fixes (T199357, T201812, T201560, T201373, T201253) (duration: 00m 51s)

I just checked this out by repeatedly changing page content and watching scores change in Test Wiki. I believe that it is scoring the latest revision, so that's good! This ticket is done.