It would be good to have monitoring dashboards for our beta cluster maps instance(s) similar to the ones in production.
Copying over related discussion from T172090:
In T172090#4494874, @Jhernandez wrote:Should we connect the test servers to a dashboard like the production one where we could see stats for tile generation rates, etc? (excuse me if it is already somewhere, couldn't find it). Seems like it could be a good idea to check on the servers before rolling to production when we make changes.
In T172090#4510031, @Jhernandez wrote:There is some info here, not very friendly tho: https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
If you go to https://grafana.wikimedia.org/ and you log in (click wikimedia logo top left, and login with ldap credentials):
Once logged in click on the Home button top left:
You will see a "New dashboard" button:
There you get a GUI where you can add panels with different types of graphs and new rows to the dashboard. Whenever I've looked into it I've explored more than followed any tutorial.
Once done, remember to save your dashboard:
For the name prefix it with "Maps" so that it shows grouped with the other ones in the big list.
One thing I've done too before is look at how an existing dashboard is made. For example if you go to the Maps performance one, you can poke the panels and edit them to see how they are configured. For that click on the title of the pane and a submenu appears, then you can click "Edit"
Then you will see the graph fullscreen and the configuration with the queries at the bottom. Feel free to poke around and change things, there is autocomplete and it is quite approachable:
NOTE: Don't save your changes if you are just poking around!Another option is to clone a dashboard, which could be useful for what we are doing which is mirror the dashboard but with the test environment. For that click on the settings icon and then "Save as":
Then you can make the appropriate changes after cloning it.
Regarding the data sources, we'll need to check that the deployment is configured to log to the appropriate statsd/graphite (more general info here, there seems to be a beta cluster graphite instance so maybe we should use that). I'm not sure exactly how we proceed here, depends on how things are logged and configured.
cc/ @Gehel for feedback about how we go on about doing this appropriately







