Page MenuHomePhabricator

Investigate gerrit session expiration
Open, NormalPublic

Description

We've been doing a lot of gerrit restarts recently. It seems that sometimes that means people lose their web sessions and have to login again; however, this should not be the case.

The web_session cache persists to disk and sessions should stay valid for 90 days.

[cache "web_sessions"]
    maxAge = 90 days
    memoryLimit = 2048
    diskLimit = 256m

Details

Related Gerrit Patches:
operations/software/gerrit : wmf/stable-2.15Add websession-flatfile plugin
operations/puppet : productionGerrit: Quadruple web session cache memory to 8192

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2019, 7:01 PM
thcipriani triaged this task as Normal priority.May 3 2019, 7:01 PM
thcipriani renamed this task from Investigate missing gerrit sessions to Investigate gerrit session expiration.May 3 2019, 7:05 PM
thcipriani updated the task description. (Show Details)

Disk

There are roughly 6,900 gerrit user accounts.

From the gerrit config page (https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cache_names)

Session storage is relatively inexpensive. The average entry in this cache is approximately 346 bytes.

~
(/^ヮ^)/*:・゚✧ units "$(calc '6900*346')bytes" "megabytes"
	* 2.3874

The size of the user database on disk should be approximately 2.4MB; however, something is strange about that estimate. The current size on disk is more than double that:

5.9M May 23 20:38 web_sessions.h2.db

I can't say why the current disk cache is using more than twice the space approximated in the documentation; particularly when I'm sure not every user is logged in right now.

Also noteworthy:

cache.<name>.diskLimit
Total size in bytes of the keys and values stored on disk. Caches that have grown bigger than this size are scanned daily at 1 AM local server time to trim the cache. Entries are removed in least recently accessed order until the cache fits within this limit. Caches may grow larger than this during the day, as the size check is only performed once every 24 hours.

Memory

Our current memoryLimit is 2048. Worthwhile to note:

For most caches where the in-memory size of each entry is relatively the same, memoryLimit is currently defined to be the number of entries held by the cache (each entry costs 1).

So we have space for 2048 entries not 2048 bytes.

In terms of memoryLimit I have actually been tracking cache sizes of a long period of time; frex right now I see

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D web_sessions                  |   274   1608 662.80k|         | 95%   0%|

Following restarts, you do see the entries in memory drop; frex two days ago I restarted gerrit at 19:09 (https://tools.wmflabs.org/sal/log/AWrbzCIYEHTBTPG-I0mx)

2019-05-21T19:00:01+00:00.txt:D web_sessions                  |   393   1527 629.07k|         | 96%   0%|
2019-05-21T19:30:01+00:00.txt:D web_sessions                  |    33   1528 629.47k|         | 93% 100%|

This is what I suppose I would expect: the number of entries in memory went down, the number of entries on disk increased by 1, and we're hitting the disk a lot more for sessions.

I have cache sizes taken every 30 minutes going back to April. I've never seen higher than 393 for memory entries, 1609 for disk.

hashar added a subscriber: hashar.May 24 2019, 11:45 AM

As for the size of the disk cache: Gerrit keeps track of logged out sessions apparently. I have explored the H2 database using strings for my user using:

strings /var/lib/gerrit2/review_site/cache/web_sessions.h2.db|grep -A5 gerrit:hashar

While logged out, I took a snapshot of it as a reference. When logged in there is an extra field. Once logged out, apparently nothing change. I then logged a second time and an additional entry has been added.

I did a few tries, and eventually the number of matches for gerrit:hashar went from 27 to 30. That might explain why there are more entries than expected and potentially the disk size.


Pure guess, as web sessions accumulate, eventually one gets older than 90 days and the current valid session gets invalidated?

Restarted Gerrit 2019-05-27T23:10,

show-cache before and after:

2019-05-27T23:00:01+00:00.txt:D web_sessions                  |   364   1676 690.93k|         | 96%   0%|
2019-05-27T23:26:06+00:00.txt:D web_sessions                  |    10   1676 690.93k|         | 95% 100%|

web_session h2 database before and after:

2019-05-27T23:00:01+00:00.txt: 5.9M May 27 22:56 web_sessions.h2.db
2019-05-27T23:25:44+00:00.txt: 5.2M May 27 23:23 web_sessions.h2.db

Change 513682 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Gerrit: Quadruple web session cache memory to 8192

https://gerrit.wikimedia.org/r/513682

Change 513682 merged by CDanis:
[operations/puppet@production] Gerrit: Quadruple web session cache memory to 8192

https://gerrit.wikimedia.org/r/513682

greg added a subscriber: greg.Jul 23 2019, 8:46 PM

Have we had many more reports of people's sessions being lost?

Have we had many more reports of people's sessions being lost?

I have not heard anyone talk about this recently, but also I was not experiencing the problem...

@ArielGlenn -- how stable has your gerrit session been since the end of June?

I had to log in again a few days ago but the week before that was fine. Maybe my login just expired, as they do.

hashar closed this task as Resolved.Jul 24 2019, 9:00 AM
hashar assigned this task to thcipriani.

Output of gerrit show-cache:

Gerrit Code Review        2.15.14-16-g855b179b5f      now    08:37:43   UTC
                                                 uptime    1 days 14 hrs

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D web_sessions                  |   197   2925   1.18m|         | 96%   2%|

From Tyler previous comment for comparison:

D web_sessions                  |   364   1676 690.93k|         | 96%   0%| 2019-05-27T23:00:01+00:00.txt
D web_sessions                  |    10   1676 690.93k|         | 95% 100%| 2019-05-27T23:26:06+00:00.txt

Tyler did:

 [cache "web_sessions"]
     maxAge = 90 days
-    memoryLimit = 2048
+    memoryLimit = 8192
     diskLimit = 256m

It seems that the disk cache is only used as a snapshot to restore from when Gerrit is restarted. Or in other terms, Gerrit only use the memory cache and as soon as it is filled up entries get evicted :-\

gerrit show-caches does not show how much caches are filled though :-\

Anyway, I have not personally experienced any logout over the last few weeks. So I guess it is fixed for now.

hashar added a comment.Oct 2 2019, 6:10 PM

For reference, with 1 day of uptime:

Gerrit Code Review        2.15.14-16-g855b179b5f      now    18:08:48   UTC

                                                 uptime    1 days  0 hrs

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D web_sessions                  |   157   4118   1.66m|         | 94%   1%|

So disk is like half filled :]

Zoranzoki21 reopened this task as Open.Oct 12 2019, 12:05 PM
Zoranzoki21 added a subscriber: Zoranzoki21.

Few users reported this problem on Wikitech-l.

@Ladsgroup and @Niharika it would be helpful for me to understand how you're using Gerrit since you've reported session expiration recently (and I haven't been able to recreate this issue myself).

Are you logged in on multiple devices? Multiple browsers? Which OS and browsers do you use?

Thanks for your help!

This happened to me twice this week. I believe this is related to T224448. As this only happens to me when gerrit has to be restarted due to that.

jcrespo added a subscriber: jcrespo.EditedOct 14 2019, 11:50 AM

This happens to me, sometimes multiple times a day. As far as I remember this always happened continuously for me at least in the last 2 years.

@thcipriani I'm only using it on one device, my laptop. I primarily use Safari as my browser on a Macbook (macOS Mojave).

As a data point, folks have complained of getting logged out between last night and this morning (although there were no gerrit restarts during that time). Interestingly, this was noted during our branch cut which is known to cause some strain on the garbage collector, currently (T231872)

I'd say I usually get a week or so, but it feels like it's happened every other day or so lately. Latest release channel Firefox / Debian, mostly on a single machine.

zeljkofilipin added a comment.EditedOct 15 2019, 6:44 PM

I think I have to log in every morning. I mostly use gerrit on one machine, macos+firefox (new gerrit UI).

Data point: I tend to need to log in about once a day, based on memory. Have not kept a log, though. I use Firefox 60 on Debian, and only ever log in on that browser on this machine. I just checked, I am not currently logged in to Gerrit, but I know I was a few days ago.

Change 543940 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/software/gerrit@wmf/stable-2.15] Add websession-flatfile plugin

https://gerrit.wikimedia.org/r/543940

I've been keeping a log for when I need to log in to Gerrit:

  • oct 16: logged in
  • oct 17: still logged in
  • oct 18: still logged in
  • oct 21: still logged in
  • oct 22: had to log in
  • oct 23: had to log in

HTH.

I've been keeping a log for when I need to log in to Gerrit:

  • oct 16: logged in
  • oct 17: still logged in
  • oct 18: still logged in
  • oct 21: still logged in
  • oct 22: had to log in
  • oct 23: had to log in

HTH.

Mine looks the same; I've had to login yesterday and today as well, despite not having to do so the days before.

And today again.

I forgot to specify my environment; I'm using Chrome 77 on Windows 10 (1903). Interestingly, I left my machine in sleep mode overnight (power-off, programs open and ready to go on start-up) and I still have to log in on Gerrit, which suggests that Gerrit somehow rejects/removes/discards the session from yesterday, even though the browser has never actually been closed.

I also have to log in again on other devices as well. (which is an iPad mini 2 using Chrome for iOS and a laptop running Chrome 77 on Windows 10 1903).

I too had to log in today. Debian buster, FF 60. No reboot.

Did you all click the remember me box?

I think we are going to experiment with the websession flatfile plugin since it’s used by larger installs (and is stable)

Did you all click the remember me box?

Of course, what type of question is this?

I too had to log in today. Debian buster, FF 60. No reboot.

Same here

Did you all click the remember me box?

Of course, what type of question is this?

I'm sometimes guilty of this... But I distinctly recall checking that box today and yesterday.

I have made a point of clicking the remember-me box every login while collecting data for this ticket

@Ladsgroup sometimes users may forget to press it. It’s easily forgettable.

Today I didn't need to log in in the morning, but did have to just now.

I don't want to add a meaningless +1, but @LarsWirzenius closely match my own perception, which would make it more likely a server issue rather than a client one.

To be clear, I'm only reporting these logins as data points to aid debugging, not to complain. The login is easy enough for me on my laptop that it's not bothering me. I can stop reporting data points if it's not useful.

@LarsWirzenius it's useful! (in matter of fact your latest report helped me to conclude that this is java gc as we restarted gerrit only last night).

We're going to try and install the websession-flatfile plugin which should improve reliability (as it won't rely on the memory for logins). Though i do have a change upstream to implement a cache, it won't be like it is now (it'll check in the memory first, then fallback to disk).

In the hope it's useful, I'll mention I'm still logged in to Gerrit, since my previous log in on October 24.

In the hope it's useful, I'll mention I'm still logged in to Gerrit, since my previous log in on October 24.

I was too, until today. I had to log in again (on all my devices).

I was too, until today. I had to log in again (on all my devices).

Same here for today (to add a datapoint).

Today again; I had remained logged in since my last comment (10 days ago - T222472#5634681).

I had to log in today as well. I had been logged in since October 24, so it seems the log-in-daily problem is getting solved.