Page MenuHomePhabricator

Spike: How long does it take for a newly created article or draft to be scored by Eranbot/Turnitin
Closed, ResolvedPublic

Description

People who review from the front of the queue want to know how long it takes for a newly created article or draft to be scored by Eranbot/Turnitin. If it takes a long time (several minutes), they would like us to consider delaying insertion into the queue or adding a notice like "Awaiting copyvio assessment".

Event Timeline

kaldari created this task.Aug 27 2018, 5:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 27 2018, 5:42 PM
kaldari triaged this task as High priority.Aug 27 2018, 5:43 PM

@MMiller_WMF - Here's the spike. Feel free to pull it into the current sprint if needed.

MMiller_WMF moved this task from Inbox to Upcoming Work on the Growth-Team board.Aug 27 2018, 8:45 PM
MMiller_WMF edited projects, added Growth-Team (Current Sprint); removed Growth-Team.

Maybe @Niharika or others from comm tech would know how is Eranbot running for CopyPatrol. (cron job pulling from RC API X times a day, always on and listening to RC feed, other way?)

Maybe @Niharika or others from comm tech would know how is Eranbot running for CopyPatrol. (cron job pulling from RC API X times a day, always on and listening to RC feed, other way?)

It listens to the IRC feed (constantly).

@eranroz -- we filed this task because we know that some users of the New Pages Feed review the newest articles in the feed. If it takes us several minutes to score new articles with copyvio, then those reviewers will be operating without the copyvio information. In that case, we would want to change the UI to indicate when a page has not been checked. But if the score is available within seconds, then such a UI change would not be needed. Do you happen to know the answer here?

I think we should have a UI indicating it hasn't checked yet

The time to process is somewhat unpredictable:

  • From bot side: currently we have a queue of 10 diffs, and send it to process in small batches. The queue size is dependent on other diffs. We can reduce this limit.
  • The bot currently block uploads of other diffs till we get response for the previous batch.
  • From Ithenticate side - it sometimes takes seconds and sometimes under heavy load may take minutes.
MMiller_WMF closed this task as Resolved.Sep 4 2018, 9:01 PM
MMiller_WMF added a subscriber: Catrope.

Thanks @eranroz. That helps. I think we can now assume that usually, CopyPatrol information will be available in the New Pages Feed without seconds, but will sometimes take minutes.

@Catrope and I discussed this implications here. Although it would be ideal to indicate in the feed when a page has not yet been scanned, we will have to make our data architecture substantially more complex, beyond the time we have available to work on this project now. Therefore, we are not going to add that capability, and instead, we'll find out from reviewers whether it seems necessary farther down the line.