Page MenuHomePhabricator

Automate the provisioning and management of MediaWiki clusters
Closed, InvalidPublic

Description

For testing multi-datacenter MediaWiki work, there is a need for two production-like MediaWiki clusters, each with its own master / slave databases, application servers, job runner, and memcached / redis servers. The workload these servers will be expected to handle is slight, so they can be relatively modest virtual machines.

The effort required for configuring a production-like MediaWiki instance is enormous. We have done it three times in three years (first Ashburn, then Beta Cluster, then Dallas), and each time it has involved a lot of repetitive, manual work. We have to do it again now, and if the multi-datacenter project is successful at making the business-case for additional data-centers attractive to our users and the board, we will be doing it again in the future.

My fear is that if we don't find a way to automate more, we will end up being inundated with work that is unpleasant, menial, repetitive, and error-prone, and we will be both inefficient and unhappy as a result.

Because the next clusters we provision will be for testing rather than production, I think it would be OK to take a chance with an immature automation framework, provided we are satisfied that it is heading in the right direction, and expect it to be ready for some production use within a one-year timeframe.

Event Timeline

ori raised the priority of this task from to Medium.
ori updated the task description. (Show Details)
ori added projects: Sustainability, SRE.
ori added subscribers: ori, GWicke, Joe.

This would be awesome! I don't want to sound pessimistic, but wouldn't this need a mountain of work in ops/puppet ? In particular, all those if this is labs, include X statements and the like? Not to mention IP, lVS and friends assignments which are currently all done by hand?

Kubernetes is way too young and is missing several features (and all container orchestrators are kind of crap still at doing stateful services like dbs), so I'll just edit that out to prevent confusion :)

yuvipanda set Security to None.

(I do believe that it can do some of these in 6months-1year, but way too early to be having that conversation specific to kubernetes, IMO)

jijiki subscribed.

I feel like this task is not relevant anymore, or if it is, it need to be rewritten in a way to reflect our current needs and infra. Closing:)