Automate the provisioning and management of MediaWiki clusters
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	ori
	Nov 17 2015, 8:11 AM

Description

For testing multi-datacenter MediaWiki work, there is a need for two production-like MediaWiki clusters, each with its own master / slave databases, application servers, job runner, and memcached / redis servers. The workload these servers will be expected to handle is slight, so they can be relatively modest virtual machines.

The effort required for configuring a production-like MediaWiki instance is enormous. We have done it three times in three years (first Ashburn, then Beta Cluster, then Dallas), and each time it has involved a lot of repetitive, manual work. We have to do it again now, and if the multi-datacenter project is successful at making the business-case for additional data-centers attractive to our users and the board, we will be doing it again in the future.

My fear is that if we don't find a way to automate more, we will end up being inundated with work that is unpleasant, menial, repetitive, and error-prone, and we will be both inefficient and unhappy as a result.

Because the next clusters we provision will be for testing rather than production, I think it would be OK to take a chance with an immature automation framework, provided we are satisfied that it is heading in the right direction, and expect it to be ready for some production use within a one-year timeframe.

Event Timeline

ori created this task.Nov 17 2015, 8:11 AM

ori raised the priority of this task from to Medium.

ori updated the task description. (Show Details)

ori added projects: Sustainability, SRE.

ori added subscribers: ori, • GWicke, Joe.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2015, 8:11 AM

yuvipanda subscribed.Nov 17 2015, 8:13 AM

• mobrovac awarded a token.Nov 17 2015, 8:17 AM

This would be awesome! I don't want to sound pessimistic, but wouldn't this need a mountain of work in ops/puppet ? In particular, all those if this is labs, include X statements and the like? Not to mention IP, lVS and friends assignments which are currently all done by hand?

Kubernetes is way too young and is missing several features (and all container orchestrators are kind of crap still at doing stateful services like dbs), so I'll just edit that out to prevent confusion :)

(I do believe that it can do some of these in 6months-1year, but way too early to be having that conversation specific to kubernetes, IMO)

Legoktm subscribed.Nov 18 2015, 5:40 PM

daniel awarded a token.Nov 18 2015, 5:40 PM

• Gilles subscribed.Nov 19 2015, 8:14 PM

Krinkle subscribed.Nov 19 2015, 8:35 PM

greg subscribed.Jan 25 2016, 8:11 PM

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:46 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:24 PM

jbond edited projects, added serviceops; removed SRE.Nov 4 2022, 1:58 PM

I feel like this task is not relevant anymore, or if it is, it need to be rewritten in a way to reflect our current needs and infra. Closing:)

Automate the provisioning and management of MediaWiki clustersClosed, InvalidPublicActions

Description

Event Timeline

Automate the provisioning and management of MediaWiki clusters
Closed, InvalidPublic
Actions