Page MenuHomePhabricator

Earwig's Copyvio Detector down due to "The supplied API key is not configured for use from this IP address." from google-api-proxy
Open, Needs TriagePublic

Description

Earwig's Copyvio Detector was not working for the last few days. When an article name is pasted on the "Page title" field, the return message is "An error occurred while using the search engine (Google Error: HTTP Error 403: Forbidden). Note: there is a daily limit on the number of search queries the tool is allowed to make. You may repeat the check without using the search engine." See here [https://tools.wmflabs.org/copyvios/?lang=en&project=wikipedia&title=Draft%3ASvetlana+Whitener&oldid=&action=search&use_engine=1&use_links=1&turnitin=0]. Not sure the problem is on Wikimedia or search engine site. Kindly assist and thank you.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 27 2020, 4:29 AM
Legoktm added a subscriber: Earwig.

Prior discussion in this thread on my talk page.

The tool is getting the following error when it tries to contact Google:

{
  "error": {
    "code": 403,
    "message": "The supplied API key is not configured for use from this IP address.",
    "errors": [
      {
        "message": "The supplied API key is not configured for use from this IP address.",
        "domain": "global",
        "reason": "forbidden"
      }
    ],
    "status": "PERMISSION_DENIED"
  }
}

The tool makes use of the google-api-proxy project. According to those docs, the proxy is supposed to have a whitelisted static IP of 185.15.56.4; however:

$ dig +noall +answer google-api-proxy.wmflabs.org
google-api-proxy.wmflabs.org. 21571 IN	A	185.15.56.49

...so it seems the IP has changed (do we know why/how?) and either the original static IP needs to be restored or the whitelist needs to be updated in the Google developer console.

Aklapper renamed this task from Earwig's Copyvio Detector down to Earwig's Copyvio Detector down due to "The supplied API key is not configured for use from this IP address." from google-api-proxy.Jan 27 2020, 8:37 PM
kaldari added subscribers: Bstorm, bd808.EditedJan 28 2020, 12:08 PM

The error is definitely coming from the IP address restriction, however, I tried adding both 185.15.56.49 and the entire 185.15.56.0/24 subnet to the whitelist in the Google developer console, but neither fixed the problem. @bd808 @Bstorm - Any idea what might be happening here? Did anything change related to the Cloud VPS external IP addresses around January 23rd? For reference, here's the Google API proxy documentation.

I'm going to temporarily remove the IP address restriction. The IPs listed in the whitelist were:
208.80.155.245
208.80.155.189
88.147.99.47
58.96.117.10
185.15.56.0/24

MusikAnimal added a subscriber: MusikAnimal.EditedJan 29 2020, 7:16 AM

We upgraded the API proxy to Debian Buster which did involve an IP change (T236557), but that was a month ago. Is copyvios not using the DNS proxy https://google-api-proxy.wmflabs.org/ ? Shouldn't the VPS floating IP be all we need to enter into the Google API Console?

While this bug is about the IP change (which appears to now be fixed), I'll note we were/are regularly hitting our quota. I can't see historical data but I recall last time this happened it was partly due to some disruptive bots. It might be worth reviewing the access logs for non-human traffic.

@MusikAnimal - This particular problem is definitely due to API access restriction, not quota issues. Do you know if anything changed about how the proxy communicates with the outside world? I tried adding 185.15.56.49 to the whitelist (the current public IP address of the proxy), but that didn't seem to work.

As a mostly outsider who's been watching this, let me toss out a couple of random thoughts.

  1. Is it possible that we're tunnelling through some sort of ipv4-embedded-in-ipv6 transport layer, and the access control lists aren't recognizing this correctly?
  1. Looking at https://wikitech.wikimedia.org/wiki/Nova_Resource:Google-api-proxy, I see we've also got https://googlevision-api-proxy.wmflabs.org/. Is that having the same problem? Might be worth comparing the configs.
bd808 added a comment.Jan 29 2020, 7:31 PM

Testing via non-Google endpoints, the google-api-proxy-03.google-api-proxy.eqiad.wmflabs is using the assigned floating IPv4 of 185.15.56.54 when connecting to external servers.

Testing Google's customsearch/v1 API from the google-api-proxy-03.google-api-proxy.eqiad.wmflabs itself is working for me using the API key and search engine id from the copyvios tool's configuration file:

$ curl -sv 'https://google-api-proxy.wmflabs.org/customsearch/v1?key=<API KEY>&cx=004717810137847674280:_bty7t0mis4&q=NOFX'
[...snip...]
< HTTP/2 200
< server: nginx/1.13.6
< date: Wed, 29 Jan 2020 19:25:53 GMT
< content-type: application/json; charset=UTF-8
[...snip...]
{
  "kind": "customsearch#search",
  "url": {
    "type": "application/json",
    "template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}&num=
{count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&f
ilter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTr
anslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json"
  },
  "queries": {
    "request": [

The same curl test is working for me from instances within Toolforge as well.

bd808 added a comment.Jan 29 2020, 7:35 PM
  1. Is it possible that we're tunnelling through some sort of ipv4-embedded-in-ipv6 transport layer, and the access control lists aren't recognizing this correctly?

Current Cloud VPS does not provide either public IPv6 addresses or any 6-in-4 tunnel mechanism to provide external IPv6 connectivity, so this is not likely to be an issue. It is a reasonable guess of something that could go wrong however for general use of IP restricted services.

The Cloud Vision API essentially has the same IPs whitelisted as T243736#5836841, and it's working fine, so I'm not sure why the same whitelist isn't working for the Custom Search API. Perhaps we should try creating a new API key? We'll need Earwig to be ready to update copyvios as soon as we do.

I'll reiterate that prior to this IP change issue that is the subject of this task, we were often hitting our quota. I mention this because users were reporting this problem before the IP changed on January 23. To the end user, both issues (quota/IP change) appeared the same since copyvios shows the same error message.

To the end user, both issues (quota/IP change) appeared the same since copyvios shows the same error message.

Yeah, this is my fault for not properly parsing the error from Google. When I have some free time I'll fix this and add proper logging so we can track actual quota usage.

Perhaps we should try creating a new API key? We'll need Earwig to be ready to update copyvios as soon as we do.

If we can only have one at a time and we want to minimize downtime, between 02:00 and 05:00 UTC would be best for me (or wait until the weekend).

The API key has been refreshed and copyvios has been updated accordingly.

I re-attempted the IP restriction and it still didn't work. I'm not sure what to do... Am I correct that all traffic on Google's end should be coming from the VPS floating IP? Just as before, I tried adding 185.15.56.49 (as dig reports, T243736#5833984), 185.15.56.54 (the floating IP we have configured in Horizon), and the whole 185.15.56.0/22 subnet. All requests fail with "The supplied API key is not configured for use from this IP address". If we are at a loss, my suggestion would be to get in contact with Google. Perhaps the issue is on their end. The IP restrictions stopped working suddenly on January 23, and only for custom search. The other API consumers do have IP restrictions and are working fine.

The tool won't load and it's been down for several hours. Can someone attempt to re-start please? Thanks.

Working again - thanks.

I just used it successfully as part of the Dr Blofeld CCI - thanks

We have been getting a 504 Gateway Time-out for the last hour or so. Is there anyone around who can try to re-start this tool? Thank you.

Guessing that someone did something because I've used it in the last couple hours

Scratch that, I apparently misread the timestamps. I did have a 504 earlier today, then it work later but it doesn't seem to be working now.

It appears to be working again now. Thanks, I will post again if there's more issues.

I have added Copyvios to Community Tech's uptime monitor, so maintainers will get emailed if it goes down. This does not cover errors with the Google API, which is the subject of this task. @Earwig If you don't want the uptime emails let me know :) If you are okay with them, make sure alert@uptimerobot.com is on your contact list. Gmail in particular seems to mistake it for spam.

I'm still trying to debug why the IP restrictions on the API consumer aren't working.

@MusikAnimal: it seems we're experiencing 403s now from the google-api-proxy that I assume are coming from our end rather than Google's:

<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.14.2</center>\r\n</body>\r\n</html>\r\n

Any ideas? We can see if it resolves itself (I'm not sure how long it's been going on for).

In T243736#6059614, Earwig wrote:

MusikAnimal: it seems we're experiencing 403s now from the google-api-proxy that I assume are coming from our end rather than Google's:

<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.14.2</center>\r\n</body>\r\n</html>\r\n

Any ideas? We can see if it resolves itself (I'm not sure how long it's been going on for).

Ticket at T250312: HTTP 403 Error when using google-api-proxy on VPS, I'm debugging now.

MusikAnimal: it seems we're experiencing 403s now from the google-api-proxy that I assume are coming from our end rather than Google's:

Any ideas? We can see if it resolves itself (I'm not sure how long it's been going on for).

Ticket at T250312: HTTP 403 Error when using google-api-proxy on VPS, I'm debugging now.

T250312 is resolved. Sorry we didn't realize the API proxy relied on XFF headers, which were removed earlier today.