Page MenuHomePhabricator

Create test Kerberos identities/accounts for some selected users in hadoop test cluster
Closed, ResolvedPublic5 Estimated Story Points

Description

Create test Kerberos identities/accounts for some selected users from Analytics (manually curated) and allow them to interact with the new auth settings to gather feedback about tools/documentation/etc.. needed.

Hadoop test cluster info: https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster

Event Timeline

Nuria triaged this task as High priority.Dec 18 2018, 9:33 PM
Nuria created this task.
mforns raised the priority of this task from High to Needs Triage.Mar 25 2019, 5:30 PM
mforns triaged this task as High priority.
Ottomata renamed this task from Create test Kerberos identities/accounts for some selected users from Analytics to Create test Kerberos identities/accounts for some selected users in hadoop test cluster.May 29 2019, 9:39 AM
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

We are finally able to allow somebody external to the Analytics team to test the Hadoop test cluster.

elukey updated the task description. (Show Details)Oct 10 2019, 3:22 PM
elukey added a subscriber: EBernhardson.EditedOct 10 2019, 3:25 PM

@EBernhardson Hi! - I am wondering if you have some spare cycles to dedicate to test kerberos in the Hadoop test cluster (see https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster) to find what works and what not. We have done a lot of testing (as you can see from the wiki page), but we might have missed something that is important for other users. There are some datasets to play with, and also a one node druid cluster. No obligation or massive/exhaustive testing needed, just a quick tour to see if you find anything weird. If you like the idea I'll create an account for you in the Hadoop test cluster :)

@diego @Neil_P._Quinn_WMF Same question for you :) - Kerberos is a new authentication scheme for Hadoop that we are going to rollout probably next quarter. It should be, for the users, as simple as logging in via a command called kinit every 24h on the Analytics hosts. The user/password will be new, not the same as LDAP. I'll explain everything in detail to you if you have time to help!

elukey added a subscriber: Isaac.Oct 14 2019, 6:39 AM

@Isaac Adding also you as well, let me know if you are interested!

@elukey Yes, I'm happy to help with this! Just let me know what you'd like me to do.

elukey added a comment.EditedOct 14 2019, 10:01 AM
In T212258#5571956, @Neil_P._Quinn_WMF wrote:

@elukey Yes, I'm happy to help with this! Just let me know what you'd like me to do.

Thanks a lot!

I created some documentation about the Hadoop test cluster in: https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster

What I'd need you to do is just login on an-tool1006, test some commands or things that you usually do and report back if anything looks weird or not working. My goal is to find any outstanding bug or to come up with a better documentation for people.

First step is to get an account: https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster#Get_a_password_for_Kerberos

I have created your username, and you should have an email in your inbox with a temporary password.

Isaac added a comment.Oct 14 2019, 4:34 PM

@elukey : also happy to help. thanks for reaching out!

@Isaac thanks a lot! I just created a kerberos account for you, you should have an email with you tmp password.

You can start from https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster#Get_a_password_for_Kerberos :)

elukey moved this task from Backlog to In Progress on the User-Elukey board.Oct 16 2019, 8:14 AM
elukey moved this task from In Progress to Kerberos on the User-Elukey board.

@elukey I'm having trouble ssh-ing into an-tool1006.eqiad.wmnet (ssh isaacj@an-tool1006.eqiad.wmnet) where it is not letting me on the server (doesn't accept my password) -- is it possible that I need to be added as a user to the machine or is it an issue on my end? thanks!

Change 543798 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::client: allow ssh access to analytics users

https://gerrit.wikimedia.org/r/543798

Change 543798 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::client: allow ssh access to analytics users

https://gerrit.wikimedia.org/r/543798

@elukey I'm having trouble ssh-ing into an-tool1006.eqiad.wmnet (ssh isaacj@an-tool1006.eqiad.wmnet) where it is not letting me on the server (doesn't accept my password) -- is it possible that I need to be added as a user to the machine or is it an issue on my end? thanks!

You are definitely right, can you retry now?

Isaac added a comment.Oct 17 2019, 3:55 PM

Yep, now able to access -- thanks! I'll do my best to test today and report back with an lgtm if no issues arise.

elukey added a comment.Nov 4 2019, 8:30 AM

@Isaac @Neil_P._Quinn_WMF checking in, any issue with kerberos? Doubts/fears/etc.. ? :)

Isaac added a comment.Nov 7 2019, 12:58 PM

@elukey I played around with it and didn't run into any major issues. Thanks for the detailed notes! My only two concerns:

  • It doesn't seem that there is a good way to provide a password automatically to kinit (e.g., from a protected text file) so that long-running scripts can automatically renew credentials periodically. This is not a blocker for me but it would be nice to have the ability -- do you have any suggestions?
  • Is the suggested workflow for running a SWAP notebook to open a terminal window in JupyterHub and kinit before starting a PySpark kernel? That works but I wasn't sure if there was a way to kinit from the notebook itself or another suggested approach.
elukey added a comment.Nov 7 2019, 2:49 PM

Thanks a lot for the tests!

@elukey I played around with it and didn't run into any major issues. Thanks for the detailed notes! My only two concerns:

  • It doesn't seem that there is a good way to provide a password automatically to kinit (e.g., from a protected text file) so that long-running scripts can automatically renew credentials periodically. This is not a blocker for me but it would be nice to have the ability -- do you have any suggestions?

The option that is currently available is a keytab, basically a file that only a certain user can read (on a host) holding the password to authenticate to Kerberos. We use those for daemons/services, and we'll plan to provide those to users with the need to run periodical jobs. The major drawbacks are:

  • our security lowers down a bit, since only being able to ssh to a host would mean being able to access HDFS (as opposed to also know a password). This is more or less the current scheme, so not a big deal, but we have to think about it.
  • The keytab can be generated for one host at the time, and it needs to be regenerated and re-deployed when the user changes the password (this doesn't happen for daemons of course). It is currently not automated, and it requires a ping to Analytics every time..

To summarize: we'll work on a solution, possibly shaped by feedback from users, but for the first iteration we'll not provide keytabs for all users (only selectively deploying those if needed). Do you think it would be acceptable?

  • Is the suggested workflow for running a SWAP notebook to open a terminal window in JupyterHub and kinit before starting a PySpark kernel? That works but I wasn't sure if there was a way to kinit from the notebook itself or another suggested approach.

I think it is the suggested way, and keep in mind that you'll have to do it only once every 24h, not every time!

Isaac added a comment.Nov 7 2019, 6:09 PM

The option that is currently available is a keytab

Ok, that works for me. I'll avoid it but it's good to know it's an option if needed.

I think it is the suggested way, and keep in mind that you'll have to do it only once every 24h, not every time!

Gotcha - thanks! I suspect we don't have any control over the default display of JupyterHub, but if there's any way to include that as a note -- e.g., something like If you are having trouble connecting to Hive, make sure that you have executed the kinit command in the terminal -- that would be helpful for when we forget or are onboarding new users.

elukey added a comment.EditedNov 8 2019, 7:42 AM

I think it is the suggested way, and keep in mind that you'll have to do it only once every 24h, not every time!

Gotcha - thanks! I suspect we don't have any control over the default display of JupyterHub, but if there's any way to include that as a note -- e.g., something like If you are having trouble connecting to Hive, make sure that you have executed the kinit command in the terminal -- that would be helpful for when we forget or are onboarding new users.

Not sure if we can do it in Jupyterhub, but probably we'll be able to add something to the MOTD of the stat/notebook hosts, so when people ssh they'll get instructions about what to do for kerberos, where to find docs, etc.. Nice suggestion thanks!

Isaac added a comment.Nov 11 2019, 9:20 PM

Not sure if we can do it in Jupyterhub, but probably we'll be able to add something to the MOTD of the stat/notebook hosts, so when people ssh they'll get instructions about what to do for kerberos, where to find docs, etc.. Nice suggestion thanks!

Sounds great - thanks!

The experiment can be called done, one identity was tested and everything looked fine. We are considering enabling Kerberos next week, for more info about what it will change from the user side: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide

elukey set the point value for this task to 5.Nov 13 2019, 9:09 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.
elukey moved this task from Kerberos to Done on the User-Elukey board.Nov 19 2019, 2:57 PM
Nuria closed this task as Resolved.Nov 22 2019, 11:03 PM