Page MenuHomePhabricator

ORES stability
Closed, ResolvedPublic

Description

ORES workers are going down periodically.

  1. https://github.com/wiki-ai/ores/pull/78 -- Performance improvements (single-request special case).
  2. https://github.com/wiki-ai/ores/pull/85 -- Registers known errors. Makes it easier to read logs. (and fixes a missing timeout in a celery async get())
  3. https://github.com/mediawiki-utilities/python-mwapi/pull/16 -- Adds timeout param to API queries
  4. @yuvipanda did something to tell redis to send TCPKeepalives

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Backlog on the Machine-Learning-Team (Active Tasks) board.
Halfak subscribed.
Halfak set Security to None.
Halfak added a subscriber: yuvipanda.