|Resolved||CommunityTechBot||T193559 Copyvio detection tool cannot use Google search engine|
|Resolved||kaldari||T194541 Investigation: Why is there a Google Proxy API usage spike every 5 days?|
Maybe you're thinking of CopyPatrol? :) When in "search mode", Copyvios does use Google: https://tools.wmflabs.org/copyvios/?lang=en&project=wikipedia&title=Hanksy&oldid=&action=search&use_engine=1&use_links=1&turnitin=0
This tool is maintained by Earwig. I'm not sure how much we can do?
So about once a week, apparently. News to me, but sounds about right...
This is unfortunate, but there’s nothing we can do about it, afaik. 10k queries is at most 1250 articles checked a day, less than one a minute on average. It doesn’t allow for a very high tool usage rate.
Wish Google gave a better error message. Suppose we can add our own.
I'm going to lower to high priority because we're definitely not going to have this addressed by tomorrow (PT time), and by then you'll be able to use the search feature again. I am told we are going to look into increasing the quota.
Unfortunately, we are already at the maximum allowed quota for Google API queries, so there's no easy solution. I've mentioned this to Dan Foy to see if we can get Google to help us work around it somehow.
If this helps, it seems like there was a spike between approximately 11:30 pm to 1:30 am where ~1.5 requests were made per minute.
Maybe it will help to do some sort of throttling on Earwig's tool - for example to not allow more than one request per minute from an IP or something to that effect.
I don't have access to request IPs on Toolforge. Other methods of tracking are creepy/error-prone (or maybe even disallowed?), and I don't want logging in to be required, so it's difficult.
That said, we can certainly try more intelligent throttling if we bake the 10,000 request limit into the tool: for example, we can have a stronger throttle the more requests happen a day, etc, to make sure the quota is spread out. However, it probably won't be very fair, and it's still going to result in unresponsiveness at the user's end.
I have no expectations of Google being particularly generous here, but if they decide to raise it, that would be an excellent solution.
@Earwig_alt Why not require logins? As it stands right now, a bot could exhaust that limit is a few hours. That would be a significant loss, given how extensively this tool is used by our community members. Copypatrol requires login due to similar reasons and we've not had any complaints about that.