Page MenuHomePhabricator

Figure out a security model for etcd
Closed, ResolvedPublic

Description

As of now, etcd supports HTTPS natively, and also supports client-side SSL certificates. In the most recent versions, however, etcd allows also to define ACLs. The only issue with ACLs is that they're not supported e.g. in confd as far as I can see. We should probably figure out a viable way to configure ACLs and leave a read-only unauthenticated section that can be reached by confd.

So, my proposal would be:

  • Upgrade the cluster to 2.2
  • Add authentication to python-etcd, conftool, update the cluster
  • Define a root user with a password; save it.
  • Add a rw user for conftool and group
  • Create a locks group, add conftool to it
  • Limit writes on /conftool to the conftool group, allowing reads from everyone
  • Limit writes on /_locks to the locks group
  • Do the same for eventlogging

Event Timeline

Joe raised the priority of this task from to Low.
Joe updated the task description. (Show Details)
Joe added projects: Traffic, acl*sre-team.
Joe added subscribers: faidon, Joe, MoritzMuehlenhoff and 2 others.

As ACLs are coming in etcd 2.1, for now I'd just use SSL connections. The other option we have is to put nginx in front of etcd.

In T97972#1298574, @Joe wrote:

As ACLs are coming in etcd 2.1, for now I'd just use SSL connections. The other option we have is to put nginx in front of etcd.

BTW, 2.1.2 landed in Debian unstable yesterday:
https://packages.qa.debian.org/e/etcd/news/20150902T032059Z.html

So I think we should bring this back up, for kubernetes in toollabs is going to be using etcd, and it should probably not be accessible to anyone outside of the kubernetes master...

yuvipanda raised the priority of this task from Low to Medium.Nov 16 2015, 8:16 AM

*bump* This is blocking general access to the k8s api.

So, as far as a general puppet interface for this would go I envision something like:

  • we create a provider for the "user" and "group" puppet resources that work with etcd, if feasible; if not we can create its own resource
  • we create a resource "etcd::acl" that effectively creates and checks ACLs on etcd

or we could just do as we do for mysql and save a list of etcdctl arguments we need to re-create our ACLs from

I'd like the former approach, but that requires significantly more work.

Joe raised the priority of this task from Medium to High.Nov 24 2015, 9:13 AM

Change 255155 had a related patch set uploaded (by Giuseppe Lavagetto):
etcd: auth puppettization [WiP]

https://gerrit.wikimedia.org/r/255155

Change 255155 merged by Giuseppe Lavagetto:
etcd: auth puppetization

https://gerrit.wikimedia.org/r/255155

I think we likely want to revisit this.

  • Right now the guest user has access to /eventlogging which I don't think we actually use anywhere?
  • It seems nice if we were in the business of making role accounts for each intended usage, instead of just relying on a single root account for all RW access. For instance, it could be nice to separate out a role for different conftool objects -- I can easily imagine wanting separate permissions for mwconfig vs other objects.

I think we likely want to revisit this.

  • Right now the guest user has access to /eventlogging which I don't think we actually use anywhere?
  • It seems nice if we were in the business of making role accounts for each intended usage, instead of just relying on a single root account for all RW access. For instance, it could be nice to separate out a role for different conftool objects -- I can easily imagine wanting separate permissions for mwconfig vs other objects.

I think we need a root account for admin reasons and to be able to run conftool-sync easily. We could avoid making it the default account for root though.

IIRC we already have an account specialized for accessing only mwconfig, we could expand on the concept.

In T97972#5352851, @Joe wrote:

IIRC we already have an account specialized for accessing only mwconfig, we could expand on the concept.

Not in etcd, we only have a root user (see v2/auth/users) and root and guest roles (see v2/auth/roles). The guest role having access to eventlogging objects, but I don't see them, so maybe relic from the past of setup in anticipation of something that never happened.

Re-opening the task since I see was already updated. Adding the result of some offline chat that have happened last week.

The straw dog proposal being:

  • [preamble] conftool loads all existing /etc/etcd/etcdrc, ~/.etcdrc, CLI param --config files and update the configuration dict so last one wins for the same key.
  • Add a new global RO account to etcd and deploy its credentials to /etc/etcd/etcdrc.
    • As a result anyone with access to any host with conftool installed will be able to query etcd. I don't think there will ever be any secret infromation in Etcd but I'd like @Joe 's opinion if there might be any risk involved with this (too many requests, etc..).
    • Make MediaWiki EtcdConfig use this same account as it should only consume data.
    • Icinga checks too should use this same account.
  • Create a RW account (with still global RO) for node and services objects to replace the current root account in /root/.etcdrc to all conftool hosts but the ones in Puppet cluster management role or puppetmasters. Required to keep pool/depool functionality.
    • Ideally I'd like for each host to be able to change only its own data, but that's not possible with the current infrastructure.
  • Global RW account (that can perfectly be the existing root one rotating its password) to be deployed only on the Puppet cluster management role and puppetmaster hosts in /root/.etcdrcto be able to perform actions on all objects (inlcuding discovery, mediawiki-config, dbconfig-*).

Thoughts?

Volans lowered the priority of this task from High to Medium.Jul 22 2019, 10:39 AM

@Joe any feedback on the above proposal? I'd really like to split the users ASAP given that dbctl is being deployed.

In T97972#5352851, @Joe wrote:

IIRC we already have an account specialized for accessing only mwconfig, we could expand on the concept.

Not in etcd, we only have a root user (see v2/auth/users) and root and guest roles (see v2/auth/roles). The guest role having access to eventlogging objects, but I don't see them, so maybe relic from the past of setup in anticipation of something that never happened.

you should not look at etcd itself, the auth is managed by nginx.

There we have two users, that can write to conftool, the root user and the user conftool, which is limited to that hierarchy.

Re-opening the task since I see was already updated. Adding the result of some offline chat that have happened last week.

The straw dog proposal being:

  • [preamble] conftool loads all existing /etc/etcd/etcdrc, ~/.etcdrc, CLI param --config files and update the configuration dict so last one wins for the same key.
  • Add a new global RO account to etcd and deploy its credentials to /etc/etcd/etcdrc.

why do we need a new global RO account? This is state data and nothing private should ever go into this system. Our state is public by definition, and published on config-master.w.o.

I don't think removing access to unauthenticated reads adds any benefit. It's rather an unnecessary complication.

  • As a result anyone with access to any host with conftool installed will be able to query etcd. I don't think there will ever be any secret infromation in Etcd but I'd like @Joe 's opinion if there might be any risk involved with this (too many requests, etc..).

See above. Right now RO access is open to unauthenticated requests and IMHO should remain that way.

  • Make MediaWiki EtcdConfig use this same account as it should only consume data.

Or use no account and live happy. Same goes for pybal and confd.

  • Icinga checks too should use this same account.

see above

  • Create a RW account (with still global RO) for node and services objects to replace the current root account in /root/.etcdrc to all conftool hosts but the ones in Puppet cluster management role or puppetmasters. Required to keep pool/depool functionality.

this is actually a good idea. we can even limit us to just node, services are static objects.

  • Ideally I'd like for each host to be able to change only its own data, but that's not possible with the current infrastructure.

It could be possible but it would require a lot of nginx weaseling.

  • Global RW account (that can perfectly be the existing root one rotating its password) to be deployed only on the Puppet cluster management role and puppetmaster hosts in /root/.etcdrcto be able to perform actions on all objects (inlcuding discovery, mediawiki-config, dbconfig-*).

Thoughts?

I think that fundamentally your proposal boils down to having one user that can only pool/depool, and use that as a default user for root on every server but the ones that do management.

I fully agree that's a good idea, and one we can apply quite quickly.

Change 564994 abandoned by Giuseppe Lavagetto:
This is a test, please disregard.

Reason:
it was a test.

https://gerrit.wikimedia.org/r/564994

Per @Volans recommendation, adding a use case here:

  • Be able to read conftool data via automated means (i.e. scripts) in order to cross check them against other data sources.

I think I will try to implement the following RBAC schema:

userRORWfrom
-*-*
root**cumin
conftool*/conftoolcumin/puppetmasters
node-$dc-$cluster*/conftool/v1/pools/$dc/$clusterAll servers in a pool

We could go deeper with other permissions, but I don't think discovery, mwconfig and the dbctl-related trees need to be accessed anywhere but on cumin/puppetmasters.

I'm a bit conflicted about the usefulness of the /conftool but I think it's good to have an "app-specific root" anyways.

RBAC without roles isn't really Role Based Access Control, but I digress.

LGTM on my side for those permissions.

Change 597805 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::etcd::tlsproxy: refresh code for modern puppet

https://gerrit.wikimedia.org/r/597805

Change 597806 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::etcd::tlsproxy: add additional users for pools

https://gerrit.wikimedia.org/r/597806

+1 as the above schema of auth reflects mostly my old comment/proposal. What's the deploy strategy?

Change 598415 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::conftool::client: allow overriding the user root can access

https://gerrit.wikimedia.org/r/598415

The deploy strategy is simply adding the new users to etcd, move most hosts to use conftool as the root user immediately, and then progressively move them to the new users system on the long run.

Change 597805 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::etcd::tlsproxy: refresh code for modern puppet

https://gerrit.wikimedia.org/r/597805

Change 597806 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::etcd::tlsproxy: add additional users for pools

https://gerrit.wikimedia.org/r/597806

Change 598415 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::conftool::client: allow overriding the user root can access

https://gerrit.wikimedia.org/r/598415

Change 602047 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::conftool::client: only use root on cumin*, puppetmasters

https://gerrit.wikimedia.org/r/602047

My tests went fine:

  • mwdebug* servers got the datacenter-appropriate pool-$dc-testserver user
  • the deployment servers got the conftool user instead

Once the last change under review is merged, the task can be considered done.

Change 602047 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::conftool::client: only use root on cumin*, puppetmasters

https://gerrit.wikimedia.org/r/602047