Page MenuHomePhabricator

Set up a generic workflow to create Kerberos accounts
Closed, ResolvedPublic8 Estimated Story Points

Description

Kerberos principals (basically users) need to be provisioned beforehand for any user that need to use Hadoop. During the SRE summit a lot of proposals were discussed to re-use existing authentication backends, but eventually the simplest and most secure solution found seems to be that every user will need to get a new account (user/password).

The idea is the following (for any new user to be created):

  1. SRE creates a new user via the Kerberos kadmind interface (locally on the kerberos host), setting a temporary password that expires in a short amount of time (even one second).
  2. An email should be sent to the user with his account details (including the temporary password)
  3. Upon first login (via kinit on one of the stat hosts for example) the user will be requested to change his/her password.

This task should evaluate the above plan and if sound, implement it.

Event Timeline

elukey created this task.Jun 19 2019, 1:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 19 2019, 1:15 PM

JFTR, this is implemented using the +needchange flag, e.g.

kadmin.local:  addprinc +needchange jmmtest01
WARNING: no policy specified for jmmtest01@WIKIMEDIA; defaulting to no
policy
Enter password for principal "jmmtest01@WIKIMEDIA":
Re-enter password for principal "jmmtest01@WIKIMEDIA":
Principal "jmmtest01@WIKIMEDIA" created.

And then after first login/auth:

jmm@kerberos1001:~$ kinit jmmtest01
Password for jmmtest01@WIKIMEDIA:
Password expired.  You must change it now.
Enter new password:
Milimetric triaged this task as High priority.Jun 20 2019, 4:28 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.
Milimetric renamed this task from Set up a workflow to create principals in Kerberos to Set up a workflow to create Kerberos accounts.Jun 20 2019, 4:31 PM
Milimetric renamed this task from Set up a workflow to create Kerberos accounts to Set up a generic workflow to create Kerberos accounts.
elukey added a comment.EditedJun 21 2019, 1:05 PM

Added the following:

elukey@re0.cr2-eqiad# show | compare
[edit firewall family inet filter analytics-in4 term kerberos from]
-       destination-port 88;
+       destination-port [ 88 464 ];

And also https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518237/

Cc: @ayounsi

elukey added a subscriber: ayounsi.Jun 21 2019, 1:07 PM

New issue:

elukey@kerberos1001:~$ kpasswd elukey@WIKIMEDIA
Password for elukey@WIKIMEDIA:
Enter new password:
Enter it again:
Authentication error: Failed reading application request

Same thing if I do kinit:

elukey@kerberos1001:~$ kinit
Password for elukey@WIKIMEDIA:
Password expired.  You must change it now.
Enter new password:
Enter it again:
kinit: Password change failed while getting initial credentials

Note about the kinit: there is no more lag between the new password and the answer from kerberos1001

A restart of kadmind seems to have fixed the issue -.-

elukey moved this task from Backlog to Kerberos on the User-Elukey board.Jul 5 2019, 6:57 AM

Change 525137 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kadminserver: add script to create principals

https://gerrit.wikimedia.org/r/525137

Change 525137 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: add script to create principals

https://gerrit.wikimedia.org/r/525137

elukey set the point value for this task to 8.Jul 24 2019, 9:04 AM
elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.

Change 525235 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kadminserver: highlight 'test' when creating users

https://gerrit.wikimedia.org/r/525235

Change 525235 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: highlight 'test' when creating users

https://gerrit.wikimedia.org/r/525235

Change 525242 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kadminserver: properly indent emails to send

https://gerrit.wikimedia.org/r/525242

Change 525242 merged by Elukey:
[operations/puppet@production] profile::kerberos::kadminserver: properly indent emails to send

https://gerrit.wikimedia.org/r/525242

elukey moved this task from Done to In Progress on the Analytics-Kanban board.Jul 24 2019, 12:52 PM

I am debugging an issue that Dan faced, and Andrew as well some time ago. Any command on an-tool1006 (like hdfs dfs -ls) leads to:

Caused by: GSSException: No valid credentials provided (Mechanism level: Generic error (description in e-text) (60) - NO PREAUTH)
	at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:770)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
	... 41 more
Caused by: KrbException: Generic error (description in e-text) (60) - NO PREAUTH
	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
	at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
	at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
	at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
	at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
	at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
	at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693)
	... 44 more
Caused by: KrbException: Identifier doesn't match expected value (906)
	at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
	at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
	at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
	... 50 more
19/07/24 12:53:15 WARN security.UserGroupInformation: PriviledgedActionException as:milimetric@WIKIMEDIA (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for milimetric@WIKIMEDIA to analytics1028.eqiad.wmnet/10.64.36.128:8020
elukey@kerberos1001:~$ sudo journalctl -u krb5-kdc.service  | grep milimetric
elukey@kerberos1001:~$ sudo journalctl -u krb5-kdc.service  | grep milimetric | tail
Jul 24 12:42:05 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:42:09 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:42:11 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:42:14 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:02 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:05 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:08 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:10 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:14 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)
Jul 24 12:53:15 kerberos1001 krb5kdc[456]: TGS_REQ (4 etypes {18 17 16 23}) 10.64.5.32: NO PREAUTH: authtime 0,  milimetric@WIKIMEDIA for hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA, Generic error (see e-text)

The KrbException: Identifier doesn't match expected value (906) seems to lead to https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_sg_jce_policy_file_install.html#topic_3_3, but as far as I can see the openjdk security policies are set to unlimited on analytics1028 (we set it via puppet) and an-tool1006 (standard openjdk-8 one).

You did something to fix this for me before, what was it?

You did something to fix this for me before, what was it?

IIRC I simply re-created the user, and thought that the issue was related to clock skew, but it seems too systemic to be a random weird issue..

My principal was working fine on an-tool1006, I have deleted and re-created it, same problem as Dan's.

ayounsi removed a subscriber: ayounsi.Jul 24 2019, 1:23 PM

Is this limited to an-tool1006 or also other hosts?
Is this limited to the HDFS command or are other commands also affected? Do basic operations like klist work as expected?

Is this limited to an-tool1006 or also other hosts?
Is this limited to the HDFS command or are other commands also affected? Do basic operations like klist work as expected?

Not limited to an-tool1006, tried also on bare metal hosts like analytics1028. For the moment any command related to HDFS seems to be affected, I am trying to check others but I don't have any at the moment.

Rolled restarted the jvms with the new one, same result (didn't have great hopes but one variable less).

elukey added a comment.EditedJul 24 2019, 2:46 PM

Found something interesting. If I create the user with +needchange, I get the issue; without the flag, all HDFS commands work fine.

EDIT: https://unix.stackexchange.com/questions/140066/kerberos-authentication-fails-with-forced-password-change suggests to use +needchange +requires_preauth. I tested it and it seems working as expected..

Change 525301 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kerberos::kdcserver: add +requires_preauth to new users

https://gerrit.wikimedia.org/r/525301

Change 525301 merged by Elukey:
[operations/puppet@production] profile::kerberos::kdcserver: add +requires_preauth to new users

https://gerrit.wikimedia.org/r/525301

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Jul 25 2019, 3:06 PM
Nuria closed this task as Resolved.Aug 21 2019, 5:41 PM