Page MenuHomePhabricator

eventlogging-service-eventbus scap deployments should depool/pool during deployment
Closed, ResolvedPublic5 Estimated Story Points

Description

We merged https://gerrit.wikimedia.org/r/#/c/367447/, but during deployment I get:

ERROR:conftool:Error when trying to set/pooled=no on service=eventbus,name=kafka2002.codfw.wmnet
ERROR:conftool:Failure writing to the kvstore: Backend error: The request requires user authentication : Insufficient credentials

I think this has something to do with the fact that the eventlogging deployment uses the eventlogging user, not deploy-service.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Puppet was broken on kafka2002:

ESC[1;31mError: Could not set home on user[eventlogging]: Execution of '/usr/sbin/usermod -d /nonexistent eventlogging' returned 8: usermod: user eventlogging is currently used by process 23546ESC[0m
ESC[1;31mError: /Stage[main]/Eventlogging::Server/User[eventlogging]/home: change from /etc/eventlogging.d to /nonexistent failed: Could not set home on user[eventlogging]: Execution of '/usr/sbin/usermod -d /nonexistent eventlogging' returned 8: usermod: user eventlogging is currently used by process 23546ESC[0m
ESC[mNotice: /Package[eventlogging/eventbus]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Package[eventlogging/eventbus]: Skipping because of failed dependenciesESC[0m
ESC[mNotice: /Stage[main]/Eventlogging::Server/File[/var/log/eventlogging]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Stage[main]/Eventlogging::Server/File[/var/log/eventlogging]: Skipping because of failed dependenciesESC[0m
ESC[mNotice: /Stage[main]/Eventlogging::Server/Logrotate::Conf[eventlogging]/File[/etc/logrotate.d/eventlogging]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Stage[main]/Eventlogging::Server/Logrotate::Conf[eventlogging]/File[/etc/logrotate.d/eventlogging]: Skipping because of failed dependenciesESC[0m
ESC[mNotice: /Stage[main]/Role::Eventbus::Eventbus/Eventlogging::Deployment::Target[eventbus]/Scap::Target[eventlogging/eventbus]/Exec[chown /srv/deployment/eventlogging for eventlogging]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Stage[main]/Role::Eventbus::Eventbus/Eventlogging::Deployment::Target[eventbus]/Scap::Target[eventlogging/eventbus]/Exec[chown /srv/deployment/eventlogging for eventlogging]: Skipping because of failed dependenciesESC[0m
ESC[mNotice: /Stage[main]/Role::Eventbus::Eventbus/File[/srv/log/eventlogging]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Stage[main]/Role::Eventbus::Eventbus/File[/srv/log/eventlogging]: Skipping because of failed dependenciesESC[0m
ESC[mNotice: /Package[python-tornado]: Dependency User[eventlogging] has failures: trueESC[0m
ESC[1;31mWarning: /Package[python-tornado]: Skipping because of failed dependenciesESC[0m

Fixed stopping eventbus and running puppet.

fdans triaged this task as Medium priority.Jul 27 2017, 3:45 PM
fdans moved this task from Incoming to Dashiki on the Analytics board.

@Joe what are the available options? Shall we migrate to use deploy-service for eventlogging or is there a way to allow the eventlogging user to get credentials for pool/depool?

So the following class is responsible to add the necessary credentials to the deploy-service:

# === Class scap::conftool
#
# Adds conftool scripts and credentials for the deploy-service user, used by
# scap3. This will allow scap3 to call "pool", "depool" and so on
#
class scap::conftool {
    include ::conftool::scripts

    ::conftool::credentials { 'deploy-service':
        home => '/var/lib/scap',
    }
}

So theoretically adding ::conftool::credentials for the eventlogging user (plus a home different than /notexistent to allow etcd credentials to be stored) should suffice to make everything work. I am bit reluctant to proceed since deploy-service seems to be the right user to do the job, so the alternative would be to move the eventbus deployment to deploy-service.

I am not strongly opposed to using deploy-service. But, this means that when we switch eventlogging analytics use to debian and systemd, all output data files will be owned by the deploy-service user. I guess this is fine, but seems a little weird to me. It seems more correct for individual services to run as an 'isolated' user for that service, but if our convention is to use deploy-service for all services, then so be it! :)

For example, we'd like to restrict kafka producers and consumers to specific topics by adding ACLs for TLS principals. Since you can authenticate as a principal if you have access to the TLS private key, the only way to restrict users on the same box from authenticating as each other is to restrict file permissions on the private key.

Say we have a bunch of scb services that interact with kafka. If they all run as deploy-service, how are we going to restrict them to specific topics?

So the following class is responsible to add the necessary credentials to the deploy-service:

# === Class scap::conftool
#
# Adds conftool scripts and credentials for the deploy-service user, used by
# scap3. This will allow scap3 to call "pool", "depool" and so on
#
class scap::conftool {
    include ::conftool::scripts

    ::conftool::credentials { 'deploy-service':
        home => '/var/lib/scap',
    }
}

So theoretically adding ::conftool::credentials for the eventlogging user (plus a home different than /notexistent to allow etcd credentials to be stored) should suffice to make everything work. I am bit reluctant to proceed since deploy-service seems to be the right user to do the job, so the alternative would be to move the eventbus deployment to deploy-service.

The deploy-service user has become the default deployer for many services, but for services that have a legitimate need to restrict deployment/ssh access, then it makes sense to use a separate user and keyholder key. Having many deployment users and restricting access to them with keyholder was part of the plan during scap development.

My instinct would be to parameterize the scap::conftool class, but looking at the patchset that added the class (https://gerrit.wikimedia.org/r/#/c/305278/4), @Joe made the comment:

I would honestly wait until we need those parameters before parametrizing this. I think whenever we need something different from the standard, we can create a specialized class like mediawiki::conftool.

So maybe that's the right way to go in this instance.

But, this means that when we switch eventlogging analytics use to debian and systemd, all output data files will be owned by the deploy-service user.

Nope. The UID used by Scap is utilised for (a) SSH-ing into the box; and (b) checking out the code. This can (and should be) different from the UID used to actually run the service. For example, we use deploy-service for the deployment of all SCB services, but each of them runs under its own UID/GID.

For example, we'd like to restrict kafka producers and consumers to specific topics by adding ACLs for TLS principals. Since you can authenticate as a principal if you have access to the TLS private key, the only way to restrict users on the same box from authenticating as each other is to restrict file permissions on the private key.

Say we have a bunch of scb services that interact with kafka. If they all run as deploy-service, how are we going to restrict them to specific topics?

Idem; the UID used to deploy a service has nothing to do with how the service is run/controlled later on.

AH! Interesting! If so, then totally fine to deploy with deploy-service, and run with user eventlogging. Using deploy-service for confd pool/depool is fine.

Change 370648 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging/scap/eventbus@master] Set deploy-service as scap ssh user and git repo owner

https://gerrit.wikimedia.org/r/370648

Change 370649 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging/scap/eventbus@master] Set deploy-service as scap ssh user and git repo owner

https://gerrit.wikimedia.org/r/370649

elukey edited projects, added Analytics-Kanban; removed Analytics.
elukey set the point value for this task to 5.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 371014 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role:eventubus: set deploy-service as scap deploy_user

https://gerrit.wikimedia.org/r/371014

Change 370649 merged by Elukey:
[eventlogging/scap/eventbus@master] Set deploy-service as scap ssh user and git repo owner

https://gerrit.wikimedia.org/r/370649

Change 371014 merged by Elukey:
[operations/puppet@production] role:eventubus: set deploy-service as scap deploy_user

https://gerrit.wikimedia.org/r/371014

Change 371466 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role:eventbus: add conftool pool/depool cred to scap

https://gerrit.wikimedia.org/r/371466

Change 371466 merged by Elukey:
[operations/puppet@production] role:eventbus: add conftool pool/depool cred to scap

https://gerrit.wikimedia.org/r/371466

Mentioned in SAL (#wikimedia-operations) [2017-08-11T13:51:23Z] <elukey> moved the eventbus scap deployment dirs on kafka[12]00[123] to deploy-service:deploy-service to allow scap to depool/pool - T171506

Marko just deployed to all the nodes, all good! (task will be closed soon as part of the Analytics kanban process)

Change 370648 abandoned by Elukey:
Set deploy-service as scap ssh user and git repo owner

Reason:
already merged

https://gerrit.wikimedia.org/r/370648