Page MenuHomePhabricator

Move Parsoid config into ops/puppet
Closed, DeclinedPublic

Description

Currently, Parsoid's deployment config is read directly from its deploy repo . However, there's an effort to standardise and unify puppet classes and configs in ops/puppet - T86633 . To that end, it would be great to have Parsoid's configs in too.

As a starting point, there's already a patch for getting rid of class duplication between labs and prod (https://gerrit.wikimedia.org/r/#/c/193082). The next step would be to move localsettings.js into puppet.

Event Timeline

mobrovac raised the priority of this task from to Needs Triage.
mobrovac updated the task description. (Show Details)

That doesn't sound great — puppet's access cannot be opened up to all deployers, which means there's going to be an implicit dependency between a Parsoid deployer and a root, to change even basic settings.

For MediaWiki, we use mediawiki-config, which rarely updated by roots (platform/RelEng deployers update it instead). I find this ideal and I think something similar could work here as well.

That was my concern as well. I also mentioned that I had hashar move the betalabs config out of puppet into parsoid deploy repo so we could tweak it without an ops dependency. That said, I was/am open to the change if that is considered okay with ops since the configs have been fairly stable for a while now.

But, given your response, maybe we should leave the current config as is in parsoid deploy repo?

I know Marco had a concern which was: "concretely, that'd also help with RB, as for now we have to hard-code parsoid's port". @mobrovac, thoughts?

I think we soon need to figure out a sane way to let developers use a (the?) config management system. We have a need to share config templates between different installations & need to keep configs consistent with centrally managed data: both core tasks of a config management system.

With service-runner and the packaging work around that we actually have a chance to separate privileged code (init scripts / systemd units) from pure config management. This leaves config settings and templates, which are less sensitive, and could perhaps be handled with a submodule that developers have +2 rights on & is automatically updated in puppet master.

There could also be other options coming out of the deployment system work. It would definitely be great to find something that makes it easier to test config template changes in a test environment without jumping through the number of hoops that are currently required.

Well, ops/root dependencies aside, puppet is a very poor tool for this for multiple reasons (e.g. it would be impossible to deploy a MediaWiki config across the fleet in a few seconds, at least not unless we scale up puppetmasters to a large fleet of machines of its own). In any case, anything that touches the puppetmaster (esp. in a "automatically updated" way) is completely unacceptable for this use case for security reasons, sorry. Not even ops/puppet Gerrit merges happen automatically there.

My impression is that the mediawiki-config system has been working great so we should at least try replicating it in the services world. Keeping the config in the parsoid deploy repo sounds perfectly fine to me -and very similar to mediawiki-config too- but maybe I don't have a good understanding of what the need is.

Which problems are you trying to solve with this?

The problem at hand is, in a wider context, service orchestration. Services depends on one another, e.g. RESTBase - Parsoid or Citoid - Zotero, and thus have to be able to access each other's configs, at least the host/port parameters.

I agree that having something like mw-config might be more suited than ops/puppet, at least until we come up with a proper service discovery mechanism. We'd still need a way of getting the appropriate config files onto the machines. Should we go ahead and give Ansible a try, as discussed in the 'future of deployments' meeting? Personally, I think we should try it out for deploying the services themselves.

IMHO, putting the configs in the individual delpoy repos should not even be discussed, this would make our lives pretty hard in the long run.

Well, service discovery is something that we are (probably :)) going to tackle this coming quarter, likely with a Zookeeper/etcd (etc.) solution. I don't think such a system would handle some of the more complicated config option needs though; it's hard to imagine e.g. mediawiki-config with its hundreds of config options and numerous if/else branches being sanely serialized into a purely key/value store such as etcd. It'd be also much harder to edit & code review, as well as to provide access for anyone to edit (such as our broader community).

For the existing service dependencies you mentioned specifically, we're also using very specific lb IPs which are are stable for now, so I don't think this is actually a problem in the short term, right? Are there any issues you're experiencing right now or is this task something to keep in mind for the future?

As for the rest, I admit I'm a bit confused: is this a discussion on where to store (& revision control) the services configs or how to deploy them to the machines themselves?

Why are individual deploy repos unsuitable, is it because of the lack of sharing config options between services? Could a unified services-config containing .js files (a la mediawiki-config) be a solution here?

@faidon, having configs in deploy repos means ultimately hardcoding config files. Those don't scale very well to many nodes and cluster setups. For example, we often break configs in beta labs because some instance was re-created & changed IPs. Setting up a second copy of beta labs is not really feasible right now as it still involves a lot of hardcoded configs. It's also quite hard to run automatic integration tests with throw-away service instances if your configuration is not automated / templated.

I agree that Puppet is really not ideal for most of these use cases, but it's the only config management system we have right now.

faidon claimed this task.

I've suggested alternatives but these don't seem to be addressed/commented on?

I also don't see the Beta people pushing for this (if anything, hashar has been moving into the opposite direction). @ssastry seemed to also not be thrilled by this either and thought it was ops' push, which clearly wasn't.

Puppet is unsuitable for this and actually a step backwards from the status quo. This isn't going to happen.