Page MenuHomePhabricator

limit the impact of heavy/large graphite queries
Closed, ResolvedPublic

Description

as discovered in a recent graphite outage, heavy/large queries can occupy all uwsgi workers, resulting in 502s. we should seek how to limit the impact of such queries, ideally with timeouts at the graphite-web level

Event Timeline

fgiunchedi raised the priority of this task from to Normal.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a subscriber: fgiunchedi.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 27 2015, 3:35 PM

See also T155872: graphite1003 short of available RAM for a case where heavy queries were not impacting uwsgi but carbon-cache instead using a lot of memory.

Updated wikitech Graphite troubleshooting on how to identify such queries.

Change 494620 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] add uwsgi worker timeouts + max RSS for graphite

https://gerrit.wikimedia.org/r/494620

Change 494620 merged by CDanis:
[operations/puppet@production] graphite: uwsgi workers: set timeouts + max RSS

https://gerrit.wikimedia.org/r/494620

CDanis closed this task as Resolved.Mar 7 2019, 4:35 PM
CDanis claimed this task.
CDanis added a subscriber: CDanis.

Just saw the new timeout work -- query returned a 500 status after ~60 seconds. Boldly going to call this resolved; of course reopen if there's still more to be done.