Page MenuHomePhabricator

wmfdata's Kerberos check should require at least 8 hours of validity
Closed, DeclinedPublic

Description

Some of our issues with Spark job failing may have been caused by our 24-hour Kerberos tickets expiring while the jobs were running.

To help alleviate this, we can update the check_kerberos_auth function to fail unless the user's Kerberos ticket is valid for at least 8 hours more. This will mean users need to kinit slightly more often, but in any case no more than once a workday.

Event Timeline

Nuria renamed this task from wmfdata's Kerberos check should require at least 8 hours of validity to wmfdata's Kerberos check should require at least 24 hours of validity.Mar 6 2020, 5:14 PM

Nuria, how does checking for 24 hours validity make sense? The maximum possible validity is 24 hours, so that would mean forcing the user to kinit before every single query they run.

  • I think* we are extending validation of kerberos tickets to be > than 24 hrs (about 48 hrs), but that might not have happened yet.
nshahquinn-wmf renamed this task from wmfdata's Kerberos check should require at least 24 hours of validity to wmfdata's Kerberos check should require at least 8 hours of validity.Mar 7 2020, 6:42 AM
  • I think* we are extending validation of kerberos tickets to be > than 24 hrs (about 48 hrs), but that might not have happened yet.

Ah, I see. Extending the validity would be helpful for us, but until you actually do that, 24 hours simply doesn't make sense.

nshahquinn-wmf moved this task from Triage to Backlog on the Product-Analytics board.

We haven't seen any Spark problems in a while, so moving this to the backlog while we wait to see if this is actually needed.

I was about to close this, but on reflection, it would be ideal to actually test what happens when you have a Spark job running when your Kerberos ticket expires, or failing that, get some expert input on whether this is actually a problem.

nshahquinn-wmf lowered the priority of this task from Medium to Low.Aug 6 2020, 12:20 PM

I was about to close this, but on reflection, it would be ideal to actually test what happens when you have a Spark job running when your Kerberos ticket expires, or failing that, get some expert input on whether this is actually a problem.

...but even if the ticket expiring causes problems, it would be a fairly rare edge case, and would probably be immediately resolved by the user trying to re-run the query and needing to renew their Kerberos credentials.

There's no indication whatsoever that people are being affected by Kerberos tickets expiring while their jobs are in progress.