While performance_schema has not yet been rolled in to all servers, it is the moment to make usage of its metrics.
The 2 final stops of the process were:
- Implement collectors
- Generate graphs
The first one depending on the second. There is some doubts on how to implement that:
- It could be added to the current system (db1011 tendril db with Toku), but I want to run away from it: while tendril and its database should stay, the graphing backend (mysql) is not very good for large amounts of data and it is currently using a google api for graphing, making it not private and slow. Also, Toku and tendril are crashing once every week due to the extreme loads
- We could add some metrics to graphite, but I am unsure if the current backend can handle the new load (+150 hosts, 5 minute resolution for 1 day, 1 hour resolution for 7 days, ~100 metrics, more coming, 400-500 GB compressed). Maybe it is time to reconvert db1011 into a dedicated graphite?
- Graphite may be substituted in the future? I am ok being used as a test, as collector daemons have to be yet written.
- Maybe MySQL can continue to be used, but we only have to implement it as a fronted for graphana, and skip graphite?
Too many questions that we should answer, test, do a proof of concept, etc.
