Page MenuHomePhabricator

ERROR: Scoring failed for edit Foo: Timed out
Closed, ResolvedPublic

Description

Once every 20 or so edit processed with Huggle results like this in the logs:

1lör aug 29 16:29:52 2015 DEBUG[6]: Processing api request https://en.wikipedia.org/w/api.php?action=query&list=users&usprop=blockinfo%7Cgroups%7Ceditcount%7Cregistration&ususers=Wikikaylov&rawcontinue=1&format=xml
2lör aug 29 16:29:52 2015 ERROR: Scoring failed for edit Connecticut Transit: Timed out
3lör aug 29 16:29:52 2015 DEBUG[3]: Deleting old edit to page Connecticut Transit
4lör aug 29 16:29:52 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&rvprop=timestamp%7Cuser%7Ccomment%7Ccontent&titles=User%20talk%3AWikikaylov&rawcontinue=1&format=xml
5lör aug 29 16:29:52 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=AMEinfo.com&rvdir=newer&rvlimit=1&rvprop=ids%7Cuser%7Ctimestamp&rawcontinue=1&format=xml
6lör aug 29 16:29:52 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&list=users&usprop=blockinfo%7Cgroups%7Ceditcount%7Cregistration&ususers=Wikikaylov&rawcontinue=1&format=xml
7lör aug 29 16:29:52 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Cuser%7Ctimestamp%7Ccomment&rvlimit=1&rvstartid=678457087&rvdiffto=prev&titles=AMEinfo.com&rawcontinue=1&format=xml
8lör aug 29 16:29:54 2015 ERROR: Scoring failed for edit Great Britain: Timed out
9lör aug 29 16:29:54 2015 DEBUG[2]: Processing webserver request http://ores.wmflabs.org/scores/enwiki/reverted/678457092/
10lör aug 29 16:29:54 2015 DEBUG[6]: Processing api request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&rvprop=timestamp%7Cuser%7Ccomment%7Ccontent&titles=User%20talk%3A208.87.234.201&rawcontinue=1&format=xml
11lör aug 29 16:29:54 2015 DEBUG[6]: Processing api request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Cuser%7Ctimestamp%7Ccomment&rvlimit=1&rvstartid=678457092&rvdiffto=prev&titles=Template%3ARetrieved%2Fdoc&rawcontinue=1&format=xml
12lör aug 29 16:29:54 2015 DEBUG[6]: Processing api request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Template%3ARetrieved%2Fdoc&rvdir=newer&rvlimit=1&rvprop=ids%7Cuser%7Ctimestamp&rawcontinue=1&format=xml
13lör aug 29 16:29:55 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Cuser%7Ctimestamp%7Ccomment&rvlimit=1&rvstartid=678457092&rvdiffto=prev&titles=Template%3ARetrieved%2Fdoc&rawcontinue=1&format=xml
14lör aug 29 16:29:55 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Template%3ARetrieved%2Fdoc&rvdir=newer&rvlimit=1&rvprop=ids%7Cuser%7Ctimestamp&rawcontinue=1&format=xml
15lör aug 29 16:29:55 2015 DEBUG[6]: Finished request https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&rvprop=timestamp%7Cuser%7Ccomment%7Ccontent&titles=User%20talk%3A208.87.234.201&rawcontinue=1&format=xml

A lot of "ERROR: Scoring failed for edit Foo: Timed out"

Event Timeline

Josve05a raised the priority of this task from to Needs Triage.
Josve05a updated the task description. (Show Details)
Josve05a subscribed.

Yes this is caused by small timeout for revscoring queries and the fact that the webserver seems to have some performance issues I guess (but not sure). Revsoring queries are blocking edits from getting into the queue so for that reason timeout is set to about 5 seconds only. You can increase it in your configuration file, but in case webserver was down you would have severe delay before edits get in queue.

Petrb added a subscriber: Halfak.

@Halfak can you check on server if there are some error logs related to this?

You can also resolve this issue by entirely disabling the revision scoring extension in preferences, just right click on it, disable and restart huggle. That will disable all ORES scores in huggle though, which are giving you hints on what is vandalism and what not.

Some edits take longer than 5 seconds to score , but they are relatively rare. If you get a timeout from ORES, I'd recommend trying to re-requesting the score. ORES internal timeout is 30 seconds.

To be precise the default timeout is 7 seconds, so it's longer

I am just wondering, why does ORES do this scoring everytime? So that when 10 users request score of same edit, it perform 10 CPU expensive computations just to do 1 thing 10 times.

Why you don't store the results to some cache?

No it doesn't duplicate computation and yes we use a cache.

Petrb claimed this task.

Solution is to increase the timeout in huggle configuration file which is in HUGGLE_HOME/Configuration/huggle3.xml

HUGGLE_HOME can be found in logs, it's one of first messages you get in there, on Mac I see:

Home: /Users/petanb/Library/Application Support/Wikimedia/Huggle

You can do that by appending this line to the configuration file somewhere near

<extern extension="Scoring Helper" name="server" value="http://ores.wmflabs.org/scores/"/>

append new line with

<extern extension="Scoring Helper" name="timeout" value="30"/>

That would change the timeout to 30 seconds which is ORES maximum, it will probably result in much slower edit processing though, so I am not sure if this is desired for every user.

You should expect to get most scores very quickly. You might get better performance if you send parallel requests (which is OK for < 50ish parallel requests).

What's your current data gathering pattern? One at a time & sequential? Batch a large N?

By T108305#1579938, it seems to request a single score per request.

You should expect to get most scores very quickly. You might get better performance if you send parallel requests (which is OK for < 50ish parallel requests).

What's your current data gathering pattern? One at a time & sequential? Batch a large N?

It's processing a single edit at a time, if it wasn't doing that the performance of huggle would be significantly lower. It's also used on projects with low EPM, not only on english wikipedia, so waiting for more edits wouldn't make much sense there. People don't like waiting for batches of edits, they like constant flow of edits.

Of course. I wouldn't suggest sending a batch unless you have a batch to score. If you are scoring edits as they come in, I think that makes fine sense. I'd recommend requesting the scores in parallel though. E.g. if two edits come in faster that the first score can be returned, don't block. Async, or a proper thread might be in order.