Page MenuHomePhabricator

ORES 500s when model_info lookup fails due to a key error
Closed, ResolvedPublic

Description

https://ores.wikimedia.org/scores/frwiki/goodfaith/?model_info=test_stats

returns:

{
  "error": {
    "code": "internal server error",
    "message": "Traceback (most recent call last):\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 154, in try_key\n    return d[int(key)]\nValueError: invalid literal for int() with base 10: 'test_stats'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"./ores/wsgi/routes/v1/scores.py\", line 29, in process_score_request\n    scoring_system, score_request, model)\n  File \"./ores/wsgi/routes/v1/util.py\", line 43, in format_some_model_info\n    model_name, request.model_info)\n  File \"./ores/scoring_context.py\", line 39, in format_model_info\n    return model_info.format(paths, formatting=\"json\")\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 106, in format\n    return self.format_json(path_tree, **kwargs)\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 127, in format_json\n    key_val = try_key(key, self)\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 156, in try_key\n    raise e\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 148, in try_key\n    return d[key]\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 24, in __getitem__\n    return self._data[key]\nKeyError: 'test_stats'\n"
  }
}

It should probably return a 400 error.

Event Timeline

Halfak triaged this task as High priority.Nov 3 2017, 8:49 PM

With these two changes, you'll instead get a 400 response. E.g.:

http://localhost:8080/v3/scores/testwiki/?models=revid&model_info=nope

{
  "error": {
    "code": "bad request",
    "message": "Model information could not be retrieved for 'nope'"
  }
}

A big spike in 500s happened today due to the following:

https://ores.wikimedia.org/scores/frwiki/damaging/?model_info=test_stats&format=json

{
  "error": {
    "code": "internal server error",
    "message": "Traceback (most recent call last):\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 154, in try_key\n    return d[int(key)]\nValueError: invalid literal for int() with base 10: 'test_stats'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"./ores/wsgi/routes/v1/scores.py\", line 29, in process_score_request\n    scoring_system, score_request, model)\n  File \"./ores/wsgi/routes/v1/util.py\", line 43, in format_some_model_info\n    model_name, request.model_info)\n  File \"./ores/scoring_context.py\", line 39, in format_model_info\n    return model_info.format(paths, formatting=\"json\")\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 106, in format\n    return self.format_json(path_tree, **kwargs)\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 127, in format_json\n    key_val = try_key(key, self)\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 156, in try_key\n    raise e\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 148, in try_key\n    return d[key]\n  File \"/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scoring/model_info.py\", line 24, in __getitem__\n    return self._data[key]\nKeyError: 'test_stats'\n"
  }
}

We're still waiting on getting this code deployed. I'm not clear what's the hold-up as I've been OOO for a while. @awight?

@Halfak No more holdups, rather I think the urgency decreased once frwiki config was fixed.

It would be great to have this code deployed soon, so we'll avoid false 5xx drills in the ops chan :)

Halfak claimed this task.