wmfdata's Kerberos check should require at least 8 hours of validity
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	nshahquinn-wmf
	Mar 6 2020, 4:59 PM

Description

Some of our issues with Spark job failing may have been caused by our 24-hour Kerberos tickets expiring while the jobs were running.

To help alleviate this, we can update the check_kerberos_auth function to fail unless the user's Kerberos ticket is valid for at least 8 hours more. This will mean users need to kinit slightly more often, but in any case no more than once a workday.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		nshahquinn-wmf	T245891 Analysts cannot reliably use wmfdata to run SQL queries against Hive databases
		Declined		nshahquinn-wmf	T247103 wmfdata's Kerberos check should require at least 8 hours of validity

Event Timeline

nshahquinn-wmf created this task.Mar 6 2020, 4:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 6 2020, 4:59 PM

nshahquinn-wmf triaged this task as Medium priority.Mar 6 2020, 5:05 PM

• Nuria renamed this task from wmfdata's Kerberos check should require at least 8 hours of validity to wmfdata's Kerberos check should require at least 24 hours of validity.Mar 6 2020, 5:14 PM

Nuria, how does checking for 24 hours validity make sense? The maximum possible validity is 24 hours, so that would mean forcing the user to kinit before every single query they run.

nshahquinn-wmf removed a project: Epic.Mar 6 2020, 5:23 PM

I think* we are extending validation of kerberos tickets to be > than 24 hrs (about 48 hrs), but that might not have happened yet.

nshahquinn-wmf renamed this task from wmfdata's Kerberos check should require at least 24 hours of validity to wmfdata's Kerberos check should require at least 8 hours of validity.Mar 7 2020, 6:42 AM

In T247103#5949194, @Nuria wrote:

I think* we are extending validation of kerberos tickets to be > than 24 hrs (about 48 hrs), but that might not have happened yet.

Ah, I see. Extending the validity would be helpful for us, but until you actually do that, 24 hours simply doesn't make sense.

nshahquinn-wmf added a project: Wmfdata-Python.Mar 11 2020, 7:28 AM

We haven't seen any Spark problems in a while, so moving this to the backlog while we wait to see if this is actually needed.

I was about to close this, but on reflection, it would be ideal to actually test what happens when you have a Spark job running when your Kerberos ticket expires, or failing that, get some expert input on whether this is actually a problem.

In T247103#6345202, @nshahquinn-wmf wrote:

I was about to close this, but on reflection, it would be ideal to actually test what happens when you have a Spark job running when your Kerberos ticket expires, or failing that, get some expert input on whether this is actually a problem.

...but even if the ticket expiring causes problems, it would be a fairly rare edge case, and would probably be immediately resolved by the user trying to re-run the query and needing to renew their Kerberos credentials.

There's no indication whatsoever that people are being affected by Kerberos tickets expiring while their jobs are in progress.

wmfdata's Kerberos check should require at least 8 hours of validityClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

wmfdata's Kerberos check should require at least 8 hours of validity
Closed, DeclinedPublic
Actions

Related Objects
Search...