Change Details

Currently we have no good way to depool a labsdb host for normal maintenance or in case of failure. The model as it stands: * All DB* Every production database replicas exist on alls each labsdb servers. * We have a really rough and dirty array in puppet that manages service urlshostnames pointing certain DBs at a certain serversphysical host (e.g. "enwiki.labsdb", "wikidatawiki.labsdb", ...). * User tables can be created on any given node andphysical host, have no life expectancy beyond the server and can havephysical host (i.e. no reasonablplication or backup), and have uptimee guarantees given they are tied to a single fallible serverphysical host. * When a labdb serverphysical host has an unplanned issue it is always an outage event for users with this setup, and when a labsdb serverphysical host has a planned maintenance window it is always an outage for any user tables on the server in questionthat host. The model we would like to consider: * All DB replicas could exist on all servers or not but probably would for node parity. * We would have a proxy or intermediary process pooling/depooling backend replicas for maintenance or in case of failure. We use [[http://www.haproxy.org/|haproxy]] for this in production. It would be really advisable IMO to keep this consistent with production. * We would have a proxy or intermediary process pooling/depooling backend replicas for maintenance or in case of failure. We use haproxy for this in production. It would be really advisable IMO to keep this consistent with production.Service hostnames for DB's would point the proxy which would ensure service integrity as much as possible * service urls for DB's would point the proxy which would ensure service integrity as much as possibleMaking changes for better availability of user tables is currently undecided. Should they remain the same? They are problematic in any situation with an abstracted labsdb replication service as persistence of user connections to the same backend that stores both their table and replica DBs is problematic. We do not have a mechanism that solves this problem. In modern setups we would shard user datebases across multiple physical hosts (which removes the ability to perform SQL JOINs with production replica tables), or accept the transient nature of user tables on replica servers. * user tables are an unknown. Should they remain the same? They are problematic in any situation with an abstracted labsdb replicateservice as persistence of user connections to the same backend that stores both their table and replica DBs is problematic. We do not have a mechanism that solves this problem. In modern setups we would shard this and consume the cost of lost join ability, or accept the transient nature of user tables on replica servers.