As of around 14 Dec (may be earlier), I've noticed many Graphite queries that used to work fine are now timing out.
I first reported this on IRC in the wikimedia-sre channel on 15 Dec:
<Krinkle> hm.. have we added restrictions to Graphite recently in terms of timeouts?
<Krinkle> https://grafana.wikimedia.org/d/000000430/resourceloader-modules-overview?orgId=1
<Krinkle> I can't seem to load the latency graphs here, showing an error each time due to "time out after 6.0 seconds"
<Krinkle> not sure why that's taking 6s though
<Krinkle> tried removing the transforms and reducing from 3d to 12h, but no dice
<Krinkle> query: 'MediaWiki.resourceloader_build.*.p99'
Hereby on Phab as well as it seems to be a persistent issue.
For example, "High avg build rate" at:
- (30 days) https://grafana-rw.wikimedia.org/d/000000430/resourceloader-modules-overview?viewPanel=34&orgId=1&from=now-30d&to=now
- (just 2 days) https://grafana.wikimedia.org/d/000000430/resourceloader-modules-overview?viewPanel=34&orgId=1&from=now-2d&to=now
<body style="background-color: #666666; color: black;"> <center> <h2 style='font-family: "Arial", sans-serif'> <p>Graphite encountered an unexpected error while handling your request.</p> <p>Please contact your site administrator if the problem persists.</p> </h2> <br/> <div style="width: 50%; text-align: center; font-family: monospace; background-color: black; font-weight: bold; color: #ff4422;"> </div> <div style="width: 70%; text-align: left; background-color: black; color: #44ff22; border: thin solid gray;"> <pre> Traceback (most recent call last): File "/usr/lib/python3/dist-packages/graphite/worker_pool/pool.py", line 113, in pool_exec job = queue.get(True, wait_time) File "/usr/lib/python3.9/queue.py", line 179, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/graphite/storage.py", line 105, in wait_jobs for job in self.pool_exec(jobs, timeout): File "/usr/lib/python3/dist-packages/graphite/worker_pool/pool.py", line 115, in pool_exec raise PoolTimeoutError("Timed out after %fs" % (time.time() - start)) graphite.worker_pool.pool.PoolTimeoutError: Timed out after 6.000199s During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/django/core/handlers/exception.py", line 34, in inner response = get_response(request) File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 115, in _get_response response = self.process_exception_by_middleware(e, request) File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python3/dist-packages/graphite/errors.py", line 101, in new_f return f(*args, **kwargs) File "/usr/lib/python3/dist-packages/graphite/render/views.py", line 129, in renderView data.extend(evaluateTarget(requestContext, targets)) File "/usr/lib/python3/dist-packages/graphite/render/evaluator.py", line 19, in evaluateTarget prefetchData(requestContext, pathExpressions) File "/usr/lib/python3/dist-packages/graphite/render/datalib.py", line 292, in prefetchData for result in STORE.fetch(pathExpressions, startTime, endTime, now, requestContext): File "/usr/lib/python3/dist-packages/graphite/storage.py", line 187, in fetch results = self.wait_jobs(jobs, settings.FETCH_TIMEOUT, File "/usr/lib/python3/dist-packages/graphite/storage.py", line 122, in wait_jobs raise Exception(message)
Exception: Timed out after 6.000234s for fetch for ['MediaWiki.resourceloader_build.*.sample_rate']
</pre> </div> </center>