Page MenuHomePhabricator

Horizon loses credentials every day
Closed, ResolvedPublic

Description

The login credential timeout on https://horizon.wikimedia.org seems very aggressive; for my usage I have to log in again every time I use it. From folks on IRC it sounds like it's on a 12-hour timeout.

This wouldn't bother me too much except that the Horizon login requires 2FA, which means I have to go find my phone and fiddle with the authenticator app every time.

In comparison, my Google and Phabricator sessions which also use 2FA seem to last a month. I'd like for my credentials to stay valid as long as possible unless revoked by a password change. A month maybe, or ideally until 2038. :)

Event Timeline

brion created this task.Sep 14 2016, 10:07 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptSep 14 2016, 10:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

From modules/openstack/templates/liberty/horizon/local_settings.py.erb:

# SESSION_TIMEOUT is in seconds and defaults to 1800.  Change to one day.
#  Note that due to
#   https://bugs.launchpad.net/django-openstack-auth/+bug/1562452
#  this may be shortened to two hours.  As of 2016-04-04 there's a
#  live hack in place on californium to alleviate that bug, but
#  this will revert when we upgrade to M, and then start working again on N.
SESSION_TIMEOUT = 86400
SESSION_COOKIE_AGE = 86400
scfc added a subscriber: scfc.Dec 13 2016, 4:09 AM

@brion: I believe this is the same issue as T130621 which was fixed. Is Horizon still losing credentials every day for you? Otherwise, please merge this task into T130621.

I logged in yesterday with "Remember me" and today I had to log-in again, so doesn't seem to be fixed.

Could it be related (from the list)?

The switch maintenance that we were bracing for didn't happen today.  I've re-enabled normal functionality on Horizon and Wikitech, but will disable them again during the window tomorrow (currently proposed for 16:00 UTC).

I probably won't spam the list with any more on-and-off messages about this tomorrow unless something truly unusual happens.
jcrespo removed a subscriber: jcrespo.Apr 27 2017, 7:43 AM
Andrew added a subscriber: Andrew.Feb 15 2018, 8:24 PM

Any news on this? I get logged out quite frequently and seemingly at random from horizon :(

Any news on this? I get logged out quite frequently and seemingly at random from horizon :(

I can confirm the issue still exists, I experiment the same. I don't really know but perhaps is related to load balancing, so you get a 50% chance that you contact a server with a valid session.

I believe that this is happening but I don't think it has to do with load-balancing, at least directly. The session keys are held in a memcached pool that is shared between the two hosts. To verify (at least the most obvious case) I just tried this:

  • reload, confirm session exists
  • stop apache on labweb1001
  • reload, confirm session exists
  • start apache on labweb1001
  • stop apache on labweb1002
  • reload, confirm session exists
  • start apache on labweb1002

I believe that if memcached restarted on either host that would have a 50% chance of killing sessions, so that could be causing the issue.

It's also possible that when I deploy an update that is somehow killing the session cache. The last time I did that was 2018-10-22 at about 17:00 UTC. Does that fit?

I believe that if memcached restarted on either host that would have a 50% chance of killing sessions, so that could be causing the issue.

It's also possible that when I deploy an update that is somehow killing the session cache. The last time I did that was 2018-10-22 at about 17:00 UTC. Does that fit?

It is possible though I now I have no way to confirm/deny , I'll note here next time horizon kicks me out, maybe that'll help tracking this down.

I just rolled out a new version of Horizon and (at a different time) restarted apache on both of the labweb boxes; in both cases my session persisted.

Just happened again this morning, logged in an hour or so ago into horizon and now I'm asked for credentials + otp again

@fgiunchedi just to clarify -- did you check the 'remember me' box when you logged in the first time?

@fgiunchedi just to clarify -- did you check the 'remember me' box when you logged in the first time?

I usually do tick the box yeah, totally possible I didn't this time though! How long will the session last with "remember me" checked?

GTirloni triaged this task as Medium priority.Mar 21 2019, 5:02 PM
GTirloni edited projects, added cloud-services-team (Kanban); removed Cloud-Services.
Bstorm added a subscriber: Bstorm.Mar 21 2019, 5:03 PM
aborrero raised the priority of this task from Medium to High.EditedNov 6 2019, 11:38 AM
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

For the record: a year has passed since last comment in this bug report, but the issue still exists. I have to enter credentials in Horizon every day. I always check the "remember me" box.

Raising task priority a bit.

I'm having trouble producing this reliably enough to debug. If this happens to someone else, please paste the contents of your sessionid cookie here before logging in again so I can try to track things down.

This comment was removed by aborrero.

I reproduced what Arturo is seeing -- the session cookie is present /until/ I visit horizon, at which point it's cleared. So Horizon definitely thinks that we're not allowed. It also looks to me like the keystone tokens are created correctly (with 7-day lifespan) so I'm not sure who is making the decision that our access has expired.

Krenair set Security to Software security bug.Nov 22 2019, 1:34 AM
Krenair added a project: acl*security.
Krenair changed the visibility from "Public (No Login Required)" to "Custom Policy".
Krenair added a subscriber: Krenair.

Combination of Arturo's sessionid and __mmapiwsid cookies logs one in as Arturo to Horizon, locking this task down

Andrew added a comment.EditedNov 22 2019, 2:18 AM

I wiped arturo's tokens from the keystone database. Thank you for catching this! I was about to comment something here about needing a 'real' web developer to help us sort this out, apparently we already needed one a couple of days ago.

aborrero changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 22 2019, 9:42 AM
aborrero removed a project: acl*security.
Restricted Application added a project: acl*security. · View Herald TranscriptNov 22 2019, 9:42 AM

I just ran an experiment forcing my traffic from one labweb to the other, and my session persisted. So it's not a split-brain issue, or at least not an obvious one.

Andrew added a comment.Dec 6 2019, 8:17 AM

This might be related to this:

SECRET_KEY = secret_key.generate_or_read_from_file(

os.path.join(LOCAL_PATH, '.secret_key_store'))

we probably want to set that to something shared between hosts.

Change 555646 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Horizon: set SECRET_KEY to the same value across a deployment

https://gerrit.wikimedia.org/r/555646

Change 555646 merged by Andrew Bogott:
[operations/puppet@production] Horizon: set SECRET_KEY to the same value across a deployment

https://gerrit.wikimedia.org/r/555646

@Andrew and I are going to pair up on this in case that helps at all soon

Can confirm that this is still happening, I'm pretty sure I checked "remember me" yesterday and today I'm asked to login again in horizon. What's the supposed session duration when ticking "remember me" ?

Andrew added a comment.Mar 3 2020, 2:23 PM

What's the supposed session duration when ticking "remember me" ?

It ought to be seven days but it is clearly broken :(

What's the supposed session duration when ticking "remember me" ?

It ought to be seven days but it is clearly broken :(

Even if it's fixed, Can it be longer? For SUL in Wikimedia it's 365 days.

It's useful for a person like me who only uses horizon to do maintenance on VMs once in a while. I don't remember a time that I didn't need to log in.

Andrew added a comment.Mar 3 2020, 2:39 PM

Even if it's fixed, Can it be longer? For SUL in Wikimedia it's 365 days.

Maybe. Because of the way we handle tokens, we have to keep keys around for as long as the longest-lived token, which makes things moderately more complicated the longer they live. I'll experiment, though, if I can ever get then to last longer than a day or so.

Change 588411 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: increase labweb memcached size

https://gerrit.wikimedia.org/r/588411

Change 588411 merged by Jhedden:
[operations/puppet@production] openstack: increase labweb memcached size

https://gerrit.wikimedia.org/r/588411

Change 588439 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] openstack: increase labweb memcached size for codfw1dev

https://gerrit.wikimedia.org/r/588439

Change 588439 merged by Andrew Bogott:
[operations/puppet@production] openstack: increase labweb memcached size for codfw1dev

https://gerrit.wikimedia.org/r/588439

with the latest changes, I've been using horizon for 3 days in a row without having to enter my credentials! wow!

Can confirm too, I logged in yesterday and no login needed today

JHedden closed this task as Resolved.May 5 2020, 1:58 PM

Increasing the memcached cache size definitely helped.

Future note: If this happens again, we should look into creating a dedicated memcached instance for Horizon. Labswiki appears to be using all of the available cache pushing Horizon sessions off the LRU list.