Page MenuHomePhabricator

Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes
Closed, ResolvedPublic

Description

Username: ironholds/bearloga
Full name: Oliver Keyes/Mikhail Popov

We need access to the analytics-search-user role account so that we can stick our data collection scripts on that rather than be dependent on individual employees, since they may, you know, leave.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Presumably, this access request needs approval from the manager of Mikhail Popov and Oliver Keyes... which is me! Approved.

RobH subscribed.

Ok, so there are a few process driven steps we need to accomplish:

  • @Ironholds needs to review and sign L3.
    • I realize that he likely signed a very old version pre-phabricator. As individuals request access changes now that phabricator is live, we request they review and sign that document.
    • @mpopov has already signed L3.

I'm assigning this task back to Oliver for his signature update on L3. Once that is done, please assign it to me (@RobH). Thanks!

Otherwise this now has their manager's approval (thanks @Deskana!) so it just now has the above update for Oliver plus a 3 day wait. If no objections are noted, this will merge on 2016-03-11 (Friday).

I stand corrected, the analytics-search-user is a sudo group, so it needs Operations meeting review.

The next meeting is Monday, 2016-03-14. I'll prepare the patchset in advance, but it won't merge until operations meeting review (plus Oliver's signature on L3.)

Sorry for the confusion.

Change 276190 had a related patch set uploaded (by RobH):
add bearloga & ironholds to analytics-search-user

https://gerrit.wikimedia.org/r/276190

Signed! Okay, that gives us a 4-day window to switch everything over.

Since I'm on clinic duty this week, I'm stealing this task back for its listing on the operations team meeting next Monday, 2016-03-14.

RobH triaged this task as High priority.Mar 11 2016, 5:33 PM

Ops meeting update: Approved, as long as Luca/Andrew (analytics) have no objections. (One should comment here to approve.)

Assigning to Luca.

IRC Note:

elukey: Andrew approved, already talked with him!

Change 276190 merged by RobH:
add bearloga & ironholds to analytics-search-user

https://gerrit.wikimedia.org/r/276190

RobH removed elukey as the assignee of this task.
RobH added a subscriber: elukey.

This was approved in today's operations meeting, and then followup approval with Luca/Andrew via Luca/IRC.

The patchset is now live on the cluster, and affected hosts will get updates on their next call into puppet.

Sooo how exactly does one *use* it? su analytics-search-user complaints about passwords. Otto maintains there isn't one.

su analytics-search-user says there's no password entry. su analytics-search still asks me for password.

But analytics-search apparently doesn't have a /home/ directory so is not actually what we were looking for.

Sod it, let's just run this through bearloga's account.

Dzahn reopened this task as Open.EditedMar 15 2016, 8:57 PM
Dzahn subscribed.

reopening, should be fixed properly instead of running it through an individual user account. after all the reasoning for creating this ticket was "rather than be dependent on individual employees"

Ok, there seems to be some major confusion on this ticket. When you say you are "requesting access to a user", that is not what is happening. You don't switch to become that user. What is happening is there is a _group_ and it's called analytics-search-user. That group has certain permissions on files. What the access request did is put users into a group, so you guys are now members. There is no switching involved.

Okay, then this was a fundamental misunderstanding from the get-go.

We asked Andrew Otto for access to or the creation of a role account that'd exist independent of any individual employee so we would be able to run regularly scheduled analytics jobs even if employees quit (as I'm doing). He pointed us to this. What should we be doing instead?

[analytics1001:~] $ id ironholds
uid=5004(ironholds) gid=500(wikidev) groups=500(wikidev),731(analytics-privatedata-users),771(analytics-search-users)

[analytics1001:~] $ id bearloga
uid=12625(bearloga) gid=500(wikidev) groups=500(wikidev),731(analytics-privatedata-users),771(analytics-search-users)

^ This shows the groups in which the 2 users ironholds and bearlog are members. See how it includes "analytics-search-users". So now if any files are owned by whoever:analytics-search-users you will be able to change them. No "su" involved at all.

actually the analytics-search-user group was created so that there is an analytics-search user. This user account is used to own oozie jobs in hadoop so that any person in the group can fix them, rather than having it owned by a specific user. This is a limited mirror of how the analytics team uses the hdfs user.

In this access request ironholds/bearloga were hoping to also create some cron jobs (same concept as the oozie jobs we run in hadoop, just a different scheduling mechanism) under the same user account, IIUC.

Well, not just cron jobs but also have somewhere to store the scripts the jobs are triggering.

We asked Andrew Otto for access to or the creation of a role account that'd exist independent of any individual employee so we would be able to run regularly scheduled analytics jobs even if employees quit (as I'm doing). He pointed us to this. What should we be doing instead?

You should all be using your individual user accounts. Those user accounts should be members in a common group. Permissions on files should be given to the group, not to users directly. When people join and leave the team they should be added and removed from the group. That way you achieve what you want, everybody can work on the same files but also works as their own user.

And the files live where? (Otto is suggesting /a/)

so that any person in the group can fix them, rather than having it owned by a specific user.

Yea, but this is why you usually give permissions to a group instead of .. a specific user that is shared among people.

In this access request ironholds/bearloga were hoping to also create some cron jobs (same concept as the oozie jobs we run in hadoop, just a different scheduling mechanism) under the same user account, IIUC.

cron jobs should be created by puppet rather than humans

so that any person in the group can fix them, rather than having it owned by a specific user.

Yea, but this is why you usually give permissions to a group instead of .. a specific user that is shared among people.

Cron perhaps can be worked with via /etc/cron.d/, but in the hadoop job scheduler jobs are owned by a single account. Only that account (and not those with shared group membership) can manipulate the job, stop it, restart it with different parameters, see the logs it generates, etc.

In this access request ironholds/bearloga were hoping to also create some cron jobs (same concept as the oozie jobs we run in hadoop, just a different scheduling mechanism) under the same user account, IIUC.

cron jobs should be created by puppet rather than humans

I originally suggested this, but the analytics team (i talked with otto on irc) felt like having researchers puppetize cron jobs was too much of an ask for many (but certainly not all) research use cases.

the analytics team (i talked with otto on irc) felt like having researchers puppetize cron jobs was too much of an ask

I have to disagree on this one. In production things should be puppetized. Adding a cron in puppet looks like this:

 cron {'my-cron-job':
              ensure  => 'present',
              command => "..;",
              user    => 'foo',
              hour    => '23',
              minute  => '23',
}

so you specify a command line, a user and a time it should run at. Just like when manually doing it with crontab -e.

So the effort is almost the same (agree, you have to get it merged) but the advantages are clearly:

  • manual crons tend to be forgotten about are not documented anywhere
  • versioning, being able to just enable/disable with a simple switch without removing the entire thing
  • others see what you are adding, peer review
  • when humans setup crons there is a risk that they interfere with other crons setup by puppet, when they fail they send mail to root@ , which means spam for all ops members
  • not even having to talk about the access issue for people

...and more

Sure, then we'll look at that when we're no longer on a timer.

I'm checking out for today. I'll start moving scripts tomorrow.

cron jobs should be created by puppet rather than humans

@Dzahn, usually I agree with this. However, on stat1002/stat1003, things are a little different. Researchers often want to run limited jobs to get a dataset. Perhaps they want to run something every hour for just a week. They will often change code as they explore the data and results, and restart jobs. Having to coordinate this with ops every time is a huge amount of overhead.

On the other hand, @Ironholds is trying to use analytics-search user to manually solve something that we use puppet for: bus factor alleviation :) So, while I do think that they should be able to run cron jobs and code all on their own, if they are writing something that is 'production' and intended to not be vulnerable to bus factor, it should probably be puppetized.

Also, for puppetization help, Discovery has an embedded opsen now! Should be a little easier!

cron jobs should be created by puppet rather than humans

@Dzahn, usually I agree with this. However, on stat1002/stat1003, things are a little different. Researchers often want to run limited jobs to get a dataset. Perhaps they want to run something every hour for just a week. They will often change code as they explore the data and results, and restart jobs. Having to coordinate this with ops every time is a huge amount of overhead.

You deploy the code via scap3 or whatever and just where it lives where it is fetched from and the cron jobs and other entry points are defined in puppet. I.e. changing the code that is triggered from cron only needs group membership. So AFAIK your stated requirements are met by the proposed solution.

For the most part, but not if they are just making short term reports or experiments. Development of this type of stuff is actually done on stat1002/1003, because the data is not available for exploration elsewhere.

For the most part, but not if they are just making short term reports or experiments.

How does making a short term report or experiment need anything besides non-puppet changes?

Okay, I quit Friday. Could we please save discussion about long-term solutions for, you know, the long-term? Because not having things break in 2 days is more my priority right now.

Right now it sounds like we've got an easy solution of storing the code in /a/discovery/ and having the role account/group run it. Long-term solutions are a different conversation.

Ironholds claimed this task.

Fair enough. Well.. the ticket already says that most requirements are met with the standard solution. And for the rest, it seems we can all agree to bring it up later if needed. Also +1 on including "discovery ops" in that case.