Page MenuHomePhabricator

ORES service erroring, in a way that throws exceptions in Extension:ORES
Closed, ResolvedPublic

Description

https://ores.wikimedia.org/v3/scores/wikidatawiki/?models=damaging%7Cgoodfaith&revids=584

{
  "error": {
      "code": "internal server error",
          "message": "Traceback (most recent call last):
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 376, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 378, in _make_request
    httplib_response = conn.getresponse()
  File \"/usr/lib/python3.4/http/client.py\", line 1172, in getresponse
    response.begin()
  File \"/usr/lib/python3.4/http/client.py\", line 351, in begin
    version, status, reason = self._read_status()
  File \"/usr/lib/python3.4/http/client.py\", line 313, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), \"iso-8859-1\")
  File \"/usr/lib/python3.4/socket.py\", line 371, in readinto
    return self._sock.recv_into(b)
  File \"/usr/lib/python3.4/ssl.py\", line 745, in recv_into
    return self.read(nbytes, buffer)
  File \"/usr/lib/python3.4/ssl.py\", line 617, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/adapters.py\", line 376, in send
    timeout=timeout
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py\", line 247, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py\", line 310, in reraise
    raise value
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 559, in urlopen
    body=body, headers=headers)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 380, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py\", line 308, in _raise_timeout
    raise ReadTimeoutError(self, url, \"Read timed out. (read timeout=%s)\" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='wikidata.org', port=443): Read timed out. (read timeout=5.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/mwapi/session.py\", line 101, in _request
    auth=auth)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/sessions.py\", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/sessions.py\", line 576, in send
    r = adapter.send(request, **kwargs)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/requests/adapters.py\", line 449, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='wikidata.org', port=443): Read timed out. (read timeout=5.0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File \"./ores/wsgi/routes/v3/scores.py\", line 24, in process_score_request
    score_response = scoring_system.score(score_request)
  File \"./ores/scoring_systems/scoring_system.py\", line 42, in score
    response = self._score(request)
  File \"./ores/scoring_systems/celery_queue.py\", line 192, in _score
    return super()._score(*args, **kwargs)
  File \"./ores/scoring_systems/scoring_system.py\", line 84, in _score
    request, missing_model_set_revs)
  File \"./ores/scoring_systems/scoring_system.py\", line 131, in _extract_root_caches
    model_set, rev_ids, injection_caches=request.injection_caches)
  File \"./ores/scoring_context.py\", line 175, in extract_root_dependency_caches
    for rev_id, (error, _) in zip(rev_ids, error_root_vals):
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/extractors/api/extractor.py\", line 123, in _extract_many
    rev_docs = self.get_rev_doc_map(revids_to_lookup, rvprop=rvprop)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/extractors/api/extractor.py\", line 230, in get_rev_doc_map
    return {rd['revid']: rd for rd in rev_docs}
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/extractors/api/extractor.py\", line 230, in <dictcomp>
    return {rd['revid']: rd for rd in rev_docs}
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/extractors/api/extractor.py\", line 240, in query_revisions_by_revids
    revids=batch_ids, **params)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/mwapi/session.py\", line 309, in get
    continuation=continuation)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/mwapi/session.py\", line 171, in request
    files=files)
  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/mwapi/session.py\", line 103, in _request
    raise TimeoutError(str(e)) from e
mwapi.errors.TimeoutError: HTTPSConnectionPool(host='wikidata.org', port=443): Read timed out. (read timeout=5.0)
"
            }
            }

Details

Related Gerrit Patches:
operations/mediawiki-config : masterUBN! disbale ores for wikidata
mediawiki/services/ores/deploy : STABLE_REVSCORING_1Blindly choose a timeout of 15 seconds
mediawiki/services/ores/deploy : masterBlindly choose a timeout of 15 seconds

Event Timeline

awight created this task.Oct 26 2017, 7:01 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 26 2017, 7:01 PM
awight triaged this task as Unbreak Now! priority.Oct 26 2017, 7:01 PM
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes. · View Herald TranscriptOct 26 2017, 7:01 PM

Seems that the root cause is:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='wikidata.org', port=443): Read timed out. (read timeout=5.0)

Ideally we would be more robust to this.

greg added a subscriber: greg.Oct 26 2017, 7:20 PM

Change 386688 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/services/ores/deploy@master] Blindly choose a timeout of 15 seconds

https://gerrit.wikimedia.org/r/386688

Change 386688 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Blindly choose a timeout of 15 seconds

https://gerrit.wikimedia.org/r/386688

Change 386691 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/services/ores/deploy@STABLE_REVSCORING_1] Blindly choose a timeout of 15 seconds

https://gerrit.wikimedia.org/r/386691

Change 386691 merged by Awight:
[mediawiki/services/ores/deploy@STABLE_REVSCORING_1] Blindly choose a timeout of 15 seconds

https://gerrit.wikimedia.org/r/386691

Mentioned in SAL (#wikimedia-operations) [2017-10-26T19:33:09Z] <awight@tin> Started deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107

Mentioned in SAL (#wikimedia-operations) [2017-10-26T19:33:20Z] <awight@tin> Finished deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107 (duration: 00m 10s)

Mentioned in SAL (#wikimedia-operations) [2017-10-26T19:34:00Z] <awight@tin> Started deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107

Mentioned in SAL (#wikimedia-operations) [2017-10-26T19:41:25Z] <awight@tin> Finished deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107 (duration: 07m 25s)

Change 386698 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] UBN! disbale ores for wikidata

https://gerrit.wikimedia.org/r/386698

Change 386698 merged by jenkins-bot:
[operations/mediawiki-config@master] UBN! disbale ores for wikidata

https://gerrit.wikimedia.org/r/386698

Mentioned in SAL (#wikimedia-operations) [2017-10-26T20:00:54Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: UBN! disbale ores for wikidata (T179107) (duration: 00m 50s)

After a quick discussion on IRC, the consensus is that Extension:ORES should continue to fatal when it receives fatals from the ORES service. We've disabled Extension:ORES for wikidata until the slow API responses can be diagnosed.

Mentioned in SAL (#wikimedia-operations) [2017-10-26T20:02:56Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: UBN! disbale ores for wikidata (T179107) (duration: 00m 50s)

One thing we can do in the future is to throw a more specific exception, which passes through the ORES service error. The service error needs to be improved as well, the root cause is buried in some misleading stuff about a "buffering" argument.

The URL still sends timeout error.

awight added a subscriber: BBlack.Oct 27 2017, 3:40 PM

I hear that Wikidata API performance has recovered, yet I can still get a timeout some of time, by using the uncached URL:
https://ores.wikimedia.org/v3/scores/wikidatawiki/?models=damaging%7Cgoodfaith&revids=584&features=

The number of timeouts varies, so I'm not entirely comfortable with the theory that the API has recovered.

Nonetheless, the working theory is that we'll reenable Extension:ORES for wikidata on Monday, before wmf.5 is redeployed to that wiki, so that we have a realistic system state. Please give @BBlack and IRC#wikimedia-operations some warning before pushing the config change.

Mentioned in SAL (#wikimedia-operations) [2017-10-30T14:07:43Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: Revert "UBN! disbale ores for wikidata" (T179107) (duration: 00m 49s)

Mentioned in SAL (#wikimedia-operations) [2017-10-30T14:09:14Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: Revert "UBN! disbale ores for wikidata" (T179107) (duration: 00m 50s)

greg added a comment.Nov 13 2017, 11:04 PM

status?

Status? What is next here? A week without updates doesn't suggest UBN! status...

Thanks for the ping--I think the ORES piece of this has calmed down, we were exacerbating T179156 and now that the cause has been alleviated, there's nothing for us to do. Closing.

awight closed this task as Resolved.Nov 13 2017, 11:07 PM