Page MenuHomePhabricator

New Pages Feed: backfill ORES scores for unscored drafts (3.3)
Closed, ResolvedPublic

Description

@SBisson recently merged code that causes ORES scores to be produced and stored for Draft pages in the mediawiki databases. Before that change, ORES was only scoring mainspace pages.

This task is about backfilling scores for Drafts created before the change. Before we release this feature all draft pages should have draftquality and wp10 scores in the new pages feed, regardless of their state in the AfC process (excluding redirects). This will be about 40,000 drafts.

Details

Related Gerrit Patches:
mediawiki/extensions/ORES : masterMaintenance script to backfill scores in PageTriage queue
mediawiki/extensions/PageTriage : masterMaintenance script to backfill ORES scores

Event Timeline

SBisson added a subscriber: Halfak.Jul 9 2018, 7:12 PM

@Halfak Any issue with us backfilling the draftquality and wp10 scores for those 40,000 drafts?

We would probably do that in a maintenance script, in batches, and with throttling if needed so as to not overload the api.

Halfak added a comment.Jul 9 2018, 7:17 PM

Shouldn't take very long to do that. I usually recommend sending requests for scores in batches of 50 with two parallel threads.

MMiller_WMF moved this task from Inbox to Q1 2018-19 on the Growth-Team board.Jul 10 2018, 5:37 PM

Change 447082 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/PageTriage@master] Maintenance script to backfill ORES scores

https://gerrit.wikimedia.org/r/447082

Change 449475 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/ORES@master] Maintenance script to backfill scores in PageTriage queue

https://gerrit.wikimedia.org/r/449475

Change 447082 abandoned by Sbisson:
Maintenance script to backfill ORES scores

https://gerrit.wikimedia.org/r/447082

Change 449475 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Maintenance script to backfill scores in PageTriage queue

https://gerrit.wikimedia.org/r/449475

Niharika removed a subscriber: Niharika.Aug 8 2018, 5:38 PM
Etonkovidova closed this task as Resolved.Aug 17 2018, 5:15 PM
Etonkovidova added a subscriber: Etonkovidova.

Checked in testwiki.

@SBisson -- just double-checking something. When we roll out ORES to English Wikipedia, will we need to run a script to backfill scores in production? Or is that already done once for the main table, and doesn't need to be done again?

@SBisson -- just double-checking something. When we roll out ORES to English Wikipedia, will we need to run a script to backfill scores in production? Or is that already done once for the main table, and doesn't need to be done again?

I don't think the script was ever executed in production. When we run it, it will backfill for all pages in the PageTriage queue (Article, User, Draft) and for all models (articlequality (formerly known as wp10), draftquality).

We should run it shortly after we've enabled AfC on enwiki so that we have plenty of time to a) react if something goes wrong and b) validate the ORES integration using the ores=1 url parameter.

@SBisson -- got it. So we'll make this part of our planning for rolling out ORES. I made this separate task so we remember to run the script in production: T203286

Please let us know via the #wikimedia-ai channel when you run this so that we can monitor the load it puts on ORES. I don't expect a hiccup especially since I expect many of these scores to be pre-cached, but its good to be present and observing a big job like this. :)

If you want to do it during regular PDT business hours just hop into the IRC channel, ping @Halfak, @awight, or @Ladsgroup (Amir1) and we'll keep an eye on grafana. No need to schedule in advance.