Page MenuHomePhabricator

Read timeout from enwiki when requesting non-existent revision
Open, MediumPublic

Description

Requesting the URL

https://ores.wikimedia.org/v3/scores/enwiki/?models=articlequality&revids=64196208800%7C123456&features

responds with an internal error saying that the server got a timeout from the enwiki API:

{
  "error": {
    "code": "internal server error",
    "message": "Traceback (most recent call last):\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 384, in _make_request\n    six.raise_from(e, None)\n  File \"<string>\", line 2, in raise_from\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 380, in _make_request\n    httplib_response = conn.getresponse()\n  File \"/usr/lib/python3.5/http/client.py\", line 1198, in getresponse\n    response.begin()\n  File \"/usr/lib/python3.5/http/client.py\", line 297, in begin\n    version, status, reason = self._read_status()\n  File \"/usr/lib/python3.5/http/client.py\", line 258, in _read_status\n    line = str(self.fp.readline(_MAXLINE + 1), \"iso-8859-1\")\n  File \"/usr/lib/python3.5/socket.py\", line 576, in readinto\n    return self._sock.recv_into(b)\n  File \"/usr/lib/python3.5/ssl.py\", line 937, in recv_into\n    return self.read(nbytes, buffer)\n  File \"/usr/lib/python3.5/ssl.py\", line 799, in read\n    return self._sslobj.read(len, buffer)\n  File \"/usr/lib/python3.5/ssl.py\", line 583, in read\n    v = self._sslobj.read(len, buffer)\nsocket.timeout: The read operation timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/requests/adapters.py\", line 445, in send\n    timeout=timeout\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 638, in urlopen\n    _stacktrace=sys.exc_info()[2])\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/util/retry.py\", line 367, in increment\n    raise six.reraise(type(error), error, _stacktrace)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/packages/six.py\", line 686, in reraise\n    raise value\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 600, in urlopen\n    chunked=chunked)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 386, in _make_request\n    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/urllib3/connectionpool.py\", line 306, in _raise_timeout\n    raise ReadTimeoutError(self, url, \"Read timed out. (read timeout=%s)\" % timeout_value)\nurllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Read timed out. (read timeout=5.0)\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/mwapi/session.py\", line 101, in _request\n    auth=auth)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/requests/sessions.py\", line 512, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/requests/sessions.py\", line 622, in send\n    r = adapter.send(request, **kwargs)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/requests/adapters.py\", line 526, in send\n    raise ReadTimeout(e, request=request)\nrequests.exceptions.ReadTimeout: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Read timed out. (read timeout=5.0)\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"./ores/wsgi/routes/v3/util.py\", line 101, in process_score_request\n    score_response = scoring_system.score(score_request)\n  File \"./ores/scoring_systems/scoring_system.py\", line 56, in score\n    response = self._score(request)\n  File \"./ores/scoring_systems/celery_queue.py\", line 196, in _score\n    return super()._score(*args, **kwargs)\n  File \"./ores/scoring_systems/scoring_system.py\", line 101, in _score\n    request, missing_model_set_revs)\n  File \"./ores/scoring_systems/scoring_system.py\", line 148, in _extract_root_caches\n    model_set, rev_ids, injection_caches=request.injection_caches)\n  File \"./ores/scoring_context.py\", line 173, in extract_root_dependency_caches\n    for rev_id, (error, _) in zip(rev_ids, error_root_vals):\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/revscoring/extractors/api/extractor.py\", line 123, in _extract_many\n    rev_docs = self.get_rev_doc_map(revids_to_lookup, rvprop=rvprop)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/revscoring/extractors/api/extractor.py\", line 230, in get_rev_doc_map\n    return {rd['revid']: rd for rd in rev_docs}\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/revscoring/extractors/api/extractor.py\", line 230, in <dictcomp>\n    return {rd['revid']: rd for rd in rev_docs}\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/revscoring/extractors/api/extractor.py\", line 241, in query_revisions_by_revids\n    **params)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/mwapi/session.py\", line 309, in get\n    continuation=continuation)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/mwapi/session.py\", line 171, in request\n    files=files)\n  File \"/srv/deployment/ores/deploy/venv/lib/python3.5/site-packages/mwapi/session.py\", line 103, in _request\n    raise TimeoutError(str(e)) from e\nmwapi.errors.TimeoutError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Read timed out. (read timeout=5.0)\n"
  }
}

Random observation: should we be using a discovery address rather than en.wikipedia.org?

Event Timeline

This is pretty funny because for one revision it works as expected: https://ores.wikimedia.org/v3/scores/enwiki/?models=articlequality&revids=64196208800&features

Random observation: should we be using a discovery address rather than en.wikipedia.org?

We don't have enwiki.discovery.wmnet, we have lvs nodes for mw nodes and we need to hit lvs ones with enwiki determined as the host in header. That's lots of work :(((

Ladsgroup raised the priority of this task from High to Needs Triage.
Ladsgroup moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.
Ladsgroup triaged this task as Medium priority.Dec 5 2018, 2:31 PM