as discovered in a recent graphite outage, heavy/large queries can occupy all uwsgi workers, resulting in 502s. we should seek how to limit the impact of such queries, ideally with timeouts at the graphite-web level
Description
Description
Details
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +32 -13 | graphite: uwsgi workers: set timeouts + max RSS |
Related Objects
Related Objects
Event Timeline
Comment Actions
See also T155872: graphite1003 short of available RAM for a case where heavy queries were not impacting uwsgi but carbon-cache instead using a lot of memory.
Comment Actions
Change 494620 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] add uwsgi worker timeouts + max RSS for graphite
Comment Actions
Change 494620 merged by CDanis:
[operations/puppet@production] graphite: uwsgi workers: set timeouts + max RSS
Comment Actions
Just saw the new timeout work -- query returned a 500 status after ~60 seconds. Boldly going to call this resolved; of course reopen if there's still more to be done.