Page MenuHomePhabricator
Paste P6953

RFC: Stop using puppet for mariadb dynamic configuration
ActivePublic

Authored by jcrespo on Apr 6 2018, 9:09 AM.
While reading this https://wikitech.wikimedia.org/wiki/Puppet_coding it is clear to me that puppet is not the right tool to manage the state of MySQL/MariaDB servers. What we are doing now, is a partial management and with practices that do not fit and should not be managed by a static configuration management system like puppet. Other changes are fully manual, slow and prone to errors.
Things that we do now with puppet that are painful and/or dangerous to do:
* Master/slave management
* Dynamic number of instances per server
* Grants and users
* Dynamic configuration changes (without a restart)
* Dynamic monitoring
Things that we cannot do right now that we should be able to do in a more automated way:
* Replication topology changes
* Autoprovisioning of data
Things that are ok to keep on puppet because they are static configuration:
* Package setup
* Static configuration
* Static monitoring
Another things that are laking on Puppet to handle the state is a "Push" model of changes, instead of a push (in which changes at several minute interval, they ask for new configuration). Changes should be transaction, so they are applied almost immediately, but they are rollback if one instance fails to apply them.
As it can be seen, the proposal is not to stop using puppet, but stop using it to try to handle the synamic state. To manage that the proposal would be:
* Setup a source of truth that is not a SPOF
* Setup monitoring based on that source of truth, and gather actual state of the servers next to that
* Make changes (topology, provisioning, etc. based on some logic with the configuration and the observed state)
How to do that, would be yet to see, but something like a distributed orchestration system would probably be ideal (e.g. a combination of tendril + wmfmariadbpy).
Note: This is not related to application-level config management (ongoing etcd-ization of mediawiki), this is only for the storage backends.