What the prometheus node gets when curling is:
root@prometheus1005:~# curl http://cloudcephmon1001:9283/metrics
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
<title>503 Service Unavailable</title>
<style type="text/css">
#powered_by {
margin-top: 20px;
border-top: 2px solid black;
font-style: italic;
}
#traceback {
color: red;
}
</style>
</head>
<body>
<h2>503 Service Unavailable</h2>
<p>Gathering data took 164.92 seconds, metrics are stale for 149.92 seconds, returning "service unavailable".</p>
<pre id="traceback">Traceback (most recent call last):
File "/lib/python3/dist-packages/cherrypy/_cprequest.py", line 670, in respond
response.body = self.handler()
File "/lib/python3/dist-packages/cherrypy/lib/encoding.py", line 220, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/lib/python3/dist-packages/cherrypy/_cpdispatch.py", line 60, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/prometheus/module.py", line 1206, in metrics
return self._metrics(_global_instance)
File "/usr/share/ceph/mgr/prometheus/module.py", line 1245, in _metrics
raise cherrypy.HTTPError(503, msg)
cherrypy._cperror.HTTPError: (503, 'Gathering data took 164.92 seconds, metrics are stale for 149.92 seconds, returning "service unavailable".')
</pre>
<div id="powered_by">
<span>
Powered by <a href="http://www.cherrypy.org">CherryPy 8.9.1</a>
</span>
</div>
</body>
</html>Might just be the extra load, but we should look into it.