Page MenuHomePhabricator

RESTBase Admin access on aqs1001, aqs1002, and aqs1003 for Joseph and Dan
Closed, ResolvedPublic

Description

Dear ops,

@JAllemandou (joal) and @Milimetric (milimetric) need access to restart Cassandra and RESTBase on the new nodes (aqs100[1-3]). We're not sure what that means, but if Gabriel, Marko, or Petr have administration access to the existing RESTBase cluster, using the same kind of permissions for us on this new cluster would be cool. The puppet that sets up Cassandra and RESTBase on these new machines is not merged, but is in final stages here: https://gerrit.wikimedia.org/r/#/c/231574

@kevinator can approve.

Event Timeline

Milimetric raised the priority of this task from to Needs Triage.
Milimetric updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@kevinator: Please add approval language to the ticket at your convenience.

RobH subscribed.

The request isn't quite clear. First its stated that restarting of restbase is required, and then its requested to have the same access that Gabriel/Marko have.

Gabriel and Marko have full sudo on restbase, which is far, far wider in scope than simply restarting the service.

If @JAllemandou and @Milimetric ONLY want restart of parsoid, that is a narrower scope.

I just want to clarify what is needed, since it isn't quite so now. Once we have that, we can work on patchsets and approvals.

RobH triaged this task as Medium priority.Sep 28 2015, 7:19 PM

@RobH: some more background might be necessary:

There will be an additional RESTBase cluster called "Analytics Query Service" and detailed in the gerrit change I mentioned in the description: https://gerrit.wikimedia.org/r/#/c/231574

This cluster will run on servers aqs1001, aqs1002, and aqs1003. It will not run Parsoid or other RESTBase services, just the Analytics RESTBase modules (right now just a single module called pageviews).

We weren't sure what access the Services team had on those boxes in order to maintain it, and that's why we were asking for the same level of access as them. If they have sudo, that probably means we need sudo too, unless anyone objects. Because we'll be the ones largely responsible for this service.

We should add a new admin group "aqs-admins" and give them the right to control this service, (start/stop/restart etc, ) as well as any command as the service user and reading log files.

That's the standard we do in other admin classes. The pattern is usually like:

'ALL = NOPASSWD: /usr/sbin/service mathoid *' to control the service mathoid and 'ALL = (mathoid) NOPASSWD: ALL' to run anything as user mathoid. Or for parsoid, 'ALL = NOPASSWD: /usr/sbin/service parsoid * to control parsoid.

We should do the same for this new service and put that group on the new nodes and the relevant people into the group.

This is different from full root though. The term "have sudo" doesn't specify if we mean the pattern above or any command as any user which equals root and is currently the case on restbase.

+1 Dzahn, that sounds perfect. Thanks for explaining. Do you want these changes to be part of the aqs puppet definition here: https://gerrit.wikimedia.org/r/#/c/231574/ ? If so, feel free to either add a new patchset or -1 with this explanation for what needs to be done.

+1 for T113416#1688129 . To clarify, @Milimetric and @JAllemandou will need to be able to:

  • sudo systemctl * restbase
  • sudo systemctl * cassandra
  • tail /var/log/syslog
  • tail /var/log/cassandra/system.log

We should also make sure their users can use Cassandra tools like nodetool and cqlsh.

Change 242735 had a related patch set uploaded (by Dzahn):
admin: add admin group for analytics query service

https://gerrit.wikimedia.org/r/242735

@mobrovac how about this for now https://gerrit.wikimedia.org/r/#/c/242735/2/modules/admin/data/data.yaml "journalctl" for anything to replace the tail commands and service cassandra *, service restbase *, that's kind of the standard for other admin groups

I can merge the above now to unblock your role change so that you can use the group name. Then this access request will still be to add people to the group and/or to discuss missing commands if any.

@mobrovac how about this for now https://gerrit.wikimedia.org/r/#/c/242735/2/modules/admin/data/data.yaml "journalctl" for anything to replace the tail commands and service cassandra *, service restbase *, that's kind of the standard for other admin groups

As I noted on the PS, RESTBase sends the logs to syslog directly, and as such bypasses systemd. However, this detail may be left for later IMHO, as it's not a blocker for the first deployment. As it currently stands in PS 231574's config, no local logging is going to take place, so my comment was more directed at future changes (cf. T112648: enable restbase syslog/file logging).

Change 242735 merged by Dzahn:
admin: add admin group for analytics query service

https://gerrit.wikimedia.org/r/242735

I added the empty group to unblock development of the puppet role.

This access request should continue as normal, by adding people to the new group aqs-admins and having their permissions acked in meeting.

Thank you @Dzahn! Assigning to @RobH for adding the needed users so the ticket makes the next Ops meeting :P

Indeed, Daniel's work on the implementation means that all that is left is the ops meeting review for @JAllemandou (joal) and @Milimetric (milimetric) for verbal approval during said meeting.

I'll put it on the upcoming meeting next Monday.

Thanks much everyone, this is great.

This request has been approved in the operations meeting. We'll enable this access later today.