We have currently partially streamlined releasing a new service to production via a bunch of abstractions for puppet/monitoring/coding infrastructure, but we're still relying on a pretty static configuration of our production infrastructure.
While this is mostly acceptable for the MediaWiki application layer, it's starting to show its limitations for services.
Ideally, given most microservices don't use a lot of resources, we need to ensure that:
* they run constantly with a given number of working instances per service
* they're reasonably resilient to hardware failures
* hardware usage is efficient enough
* single services are properly isolated from each other
* it's easy to deploy a new service, and that it does require the minimum amount of ops intervention once the service is set up
* it's easy for developers to test their service reliably and be guaranteed that the environment it runs on in production is extremely similar to what they can reproduce both locally and in labs/beta
* There is a clear, defined way to refer to other services from your own service in this environment
We think a potentially interesting way of achieving this is to use [[ http://kubernetes.io | kubernetes ]] - a cluster coordination solution developed by google that uses containers and dynamic configurations - to this aim; kubernetes is currently being used in toollabs as a modern, nice replacement for the rusty gridengine, and we're quite happy with it.
There is a ton of things we have to figure out before we can think of deploying this to production, from mananging containers security to monitoring/alerting to permissions. This session is supposed to be a open discussion about the experience ops is having with kubernetes in toollabs, what else would be needed for using it in production, and how do we plan to go on and try to extend its usage both on the short and on the long term.