Page MenuHomePhabricator

Investigate Gerrit h2 cache being way too large
Closed, ResolvedPublic

Description

Upstream issue: https://bugs.chromium.org/p/gerrit/issues/detail?id=16474

Gerrit had it is / partition filed up when we upgraded from 3.4 to 3.5 which was caused by reindexing of changes filing up file diff caches (T323262).

Gerrit stores cache data in H2 database files and two of them are overinflated on disk:

/var/lib/gerrit2/review_site/cache/Size (MB)
git_file_diff.h2.db8376
gerrit_file_diff.h2.db11597

There is also a 1GBytes file at /var/lib/gerrit2/review_site/db/account_patch_reviews.h2.db

gerrit show-caches displays smaller usage (around 150M if I get it right):

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D gerrit_file_diff              | 24562 150654 157.36m|  14.9ms | 72%  44%|
D git_file_diff                 | 12998 143329 158.06m|  14.8ms |  3%  14%|

I would like to inspect the database to figure out what kind of data are there and whether some garbage collection can be achieved to shrink the h2.db files.

Gerrit uses an old version com.h2database:h2:1.3.176. It connects, as understand it, without any username or password using:

java/com/google/gerrit/server/cache/h2/H2CacheImpl.java
this.conn = org.h2.Driver.load().connect(url, null);

I have made a compressed copy of one of the cache on gerrit1001.wikimedia.org at /home/hashar/git_file_diff.h2.db.gz which is only 32MBytes.

I have tried to access the database locally but hitting a wall:

$ /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp h2-1.3.176-ijar.jar org.h2.tools.Recover
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file org/h2/tools/Recover

With a more recent version of h2 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp h2-2.1.214.jar org.h2.tools.Recover it results in:

git_file_diff.h2.db.h2.sql
-- MVStore
CREATE ALIAS IF NOT EXISTS READ_BLOB_MAP FOR 'org.h2.tools.Recover.readBlobMap';
CREATE ALIAS IF NOT EXISTS READ_CLOB_MAP FOR 'org.h2.tools.Recover.readClobMap';
-- LOB
CREATE TABLE IF NOT EXISTS INFORMATION_SCHEMA.LOB_BLOCKS(LOB_ID BIGINT, SEQ INT, DATA VARBINARY, PRIMARY KEY(LOB_ID, SEQ));
-- lobMap.size: 0
-- lobData.size: 0
-- Layout
-- chunk.1 = chunk:1,block:2,len:1,liveMax:380,livePages:2,map:9,max:410,next:3,pages:4,root:400000f746,time:1a,unusedAtVersion:1,version:1,toc:427,occupancy:0a
-- meta.id = 1
-- root.1 = 4000008d50
-- root.2 = 4000002a8e
-- root.5 = 8000002ac6
-- Meta
-- map.2 = name:_
-- map.3 = name:openTransactions
-- map.4 = name:undoLog.1
-- map.5 = name:table.0,key:8fa25204,val:5803b3f1
-- map.6 = name:lobMap,key:8fa25204,val:f4470498
-- map.7 = name:tempLobMap,key:8fa25204,val:59a6a071
-- map.8 = name:lobRef,key:eabe0274,val:436a4e4b
-- map.9 = name:lobData,key:8fa25204,val:59a6a071
-- name._ = 2
-- name.lobData = 9
-- name.lobMap = 6
-- name.lobRef = 8
-- name.openTransactions = 3
-- name.table.0 = 5
-- name.tempLobMap = 7
-- name.undoLog.1 = 4
-- Types
-- 436a4e4b = org.h2.mvstore.db.NullValueDataType@574caa3f
-- 5803b3f1 = org.h2.mvstore.tx.VersionedValueType@5803b3f1
-- 59a6a071 = org.h2.mvstore.type.ByteArrayDataType@59a6a071
-- 8fa25204 = org.h2.mvstore.type.LongDataType@8fa25204
-- eabe0274 = org.h2.mvstore.db.LobStorageMap$BlobReference$Type@eabe0274
-- f4470498 = org.h2.mvstore.db.LobStorageMap$BlobMeta$Type@f4470498
-- Tables
---- Schema SET ----
SET CREATE_BUILD 214;
---- Table Data ----
---- Schema ----
CREATE USER IF NOT EXISTS "" SALT '' HASH '' ADMIN;
DROP ALIAS READ_BLOB_MAP;
DROP ALIAS READ_CLOB_MAP;
DROP TABLE IF EXISTS INFORMATION_SCHEMA.LOB_BLOCKS;

Essentially there is no data found :-\

Event Timeline

I did a few wrong things:

  • using h2-1.3.176-ijar.jar which I think only has the interfaces not the actual code
  • The jdbc URL jdbc:h2:git_file_diff.h2.db create a database file git_file_diff.h2.db.h2.db which is well.. empty. Had to strop the .h2.db suffix

Full command which dumped data:

java -cp h2-1.3.176.jar org.h2.tools.Script -url 'jdbc:h2:git_file_diff'

Results in a 165MB backup.sql file.

And to get a SQL prompt:

java -cp h2-1.3.176.jar org.h2.tools.Shell -url 'jdbc:h2:file:./git_file_diff;IFEXISTS=TRUE'

There are 127709 entries.

In the SQL prompt I issued a SHUTDOWN COMPACT; and once completed the file has shrunk to 289MBytes.

From http://www.h2database.com/html/features.html#compacting

Empty space in the database file re-used automatically. When closing the database, the database is automatically compacted for up to 200 milliseconds by default.
To compact more, use the SQL statement SHUTDOWN COMPACT. However re-creating the database may further reduce the database size because this will re-build the indexes.
...
See also the sample application org.h2.samples.Compact

maxCompactTime
Database setting MAX_COMPACT_TIME (default: 200).

Which is milliseconds. The setting can be passed to the H2 jdbc URL, probably with something such as ;MAX_COMPACT_TIME=15000 which would allow compacting to run for 15 * 1000 milliseconds = 15 seconds.

I have grabbed the file from production then tried to compact it locally:

$ gunzip -k git_file_diff.h2.db.gz
$ java -cp h2-1.3.176.jar org.h2.tools.Shell -url 'jdbc:h2:file:./git_file_diff;IFEXISTS=TRUE;MAX_COMPACT_TIME=15000'
sql> SHUTDOWN COMPACT;

The file went from 9G to 302MB \o/

I am entirely surely we have been copying those cache files over and over for the last ten years and they only ever had 200ms to be compacted which resulted in all those files to grow out of control.

We need to compact all of them (requires Gerrit to be shutdown) and probably could use a patch to be send Upstream to allow a longer compaction window.

After looking at the H2 Database source code 'see below) we might be able to set MAX_COMPACT_TIME via a system setting: -Dh2.maxCompactTime. It is not explicitly mentioned in the documentation but the code seems to look up the properties:

version-1.3.176/h2/src/main/org/h2/engine/SettingsBase.java
/**
 * Get the setting for the given key.
 *
 * @param key the key
 * @param defaultValue the default value
 * @return the setting
 */
protected String get(String key, String defaultValue) {
    StringBuilder buff = new StringBuilder("h2.");
    boolean nextUpper = false;
    for (char c : key.toCharArray()) {
        if (c == '_') {
            nextUpper = true;
        } else {
            // Character.toUpperCase / toLowerCase ignores the locale
            buff.append(nextUpper ? Character.toUpperCase(c) : Character.toLowerCase(c));
            nextUpper = false;
        }
    }
    String sysProperty = buff.toString();
    String v = settings.get(key);
    if (v == null) {
        v = Utils.getProperty(sysProperty, defaultValue);
        settings.put(key, v);
    }
    return v;
}

Change 865023 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: raise H2 compaction time

https://gerrit.wikimedia.org/r/865023

Change 865023 merged by Dzahn:

[operations/puppet@production] gerrit: raise H2 compaction time

https://gerrit.wikimedia.org/r/865023

Mentioned in SAL (#wikimedia-releng) [2022-12-07T21:25:45Z] <hashar> gerrit: on next restart it will started with Java property -Dh2.maxCompactTime=15000 and on the next shutdown that would cause it to compact the oversized H2 database files | https://gerrit.wikimedia.org/r/c/operations/puppet/+/865023/ | T323754

gerrit2001.wikimedia.org is the replica. It wrongfully got all materials from the primary rsynced in which means it has a snapshot of H2 databases available. As a replica, I am not sure whether it acts on them, then I guess it does make use of some of the caches. Here is a snapshot of its current state:

gerrit show-caches
  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
  adv_bases                     |                     |         |         |
  change_notes                  |     5               |  10.1ms | 50%     |
  changeid_project              |                     |         |         |
  default_preferences           |                     |         |         |
  external_ids_map              |     1               | 291.6ms | 26%     |
  groups                        |                     |         |         |
  groups_bymember               |                     |         |         |
  groups_byname                 |                     |         |         |
  groups_bysubgroup             |                     |         |         |
  groups_byuuid                 |    17               |   2.8ms | 57%     |
  groups_external               |                     |         |         |
  groups_external_persisted     |                     |         |         |
  ldap_group_existence          |                     |         |         |
  ldap_groups                   |     1               | 231.0ms | 50%     |
  ldap_groups_byinclude         |                     |         |         |
  ldap_usernames                |                     |         |         |
  permission_sort               |  1024               |  15.4us | 99%     |
  plugin_resources              |                     |         |         |
  project_list                  |     1               |  72.0ms | 75%     |
  projects                      |  1477               | 488.1us | 99%     |
  prolog_rules                  |                     |         |         |
  sshkeys                       |     1               | 323.2ms | 85%     |
  its-phabricator-its_rules_project|                     |         |         |
  lfs-lfs_project_locks         |                     |         |         |
  plugin-manager-plugins_list   |     1               |   11.2s |  0%     |
D accounts                      |     2   9723   3.47m|  19.0ms | 85% 100%|
D approvals                     |                0.00k|         |         |
D change_kind                   |       400769  48.78m|         |         |
D comment_context               |       346890 260.69m|         |         |
D conflicts                     |       155687  20.76m|         |         |
D diff_intraline                |        39050  49.07m|         |         |
D diff_summary                  |        30779  14.91m|         |         |
D gerrit_file_diff              |                0.00k|         |         |
D git_file_diff                 |                0.00k|         |         |
D git_modified_files            |                0.00k|         |         |
D git_tags                      |                0.00k|         |         |
D groups_byuuid_persisted       |         2130   1.17m|         |     100%|
D mergeability                  |       777890 128.00m|         |         |
D modified_files                |                0.00k|         |         |
D oauth_tokens                  |                0.00k|         |         |
D persisted_projects            |         3011   5.71m|         |     100%|
D pure_revert                   |                0.00k|         |         |
D web_sessions                  |        72364  29.11m|         |         |

gerrit_file_diff or git_file_diff are not populated since there is no web frontend and hence diff are never rendered.

The files on disk:

$ ls -lah /var/lib/gerrit2/review_site/{db/account_patch_reviews,cache/*}.h2.db|cut -b30-
 12M Dec  8 14:53 /var/lib/gerrit2/review_site/cache/accounts.h2.db
2.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/approvals.h2.db
238M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/change_kind.h2.db
609M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/comment_context.h2.db
 65M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/conflicts.h2.db
187M Nov 17 09:04 /var/lib/gerrit2/review_site/cache/diff.h2.db
184M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/diff_intraline.h2.db
 31M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/diff_summary.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/git_file_diff.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/git_modified_files.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/git_tags.h2.db
3.5M Dec  7 11:50 /var/lib/gerrit2/review_site/cache/groups_byuuid_persisted.h2.db
804M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/mergeability.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/modified_files.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/oauth_tokens.h2.db
 19M Dec  8 14:56 /var/lib/gerrit2/review_site/cache/persisted_projects.h2.db
1.1M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/pure_revert.h2.db
2.2M Jun  8  2018 /var/lib/gerrit2/review_site/cache/quota.repo_size.h2.db
 98M Nov 17 09:08 /var/lib/gerrit2/review_site/cache/web_sessions.h2.db
994M Nov 17 09:08 /var/lib/gerrit2/review_site/db/account_patch_reviews.h2.db

quota.repo_size.h2.db is from 2018 so probably does not exist anymore as a cache. Most haven't been touched since November 17th. Thus I don't know whether anything will be compacted after restarting Gerrit twice.

Mentioned in SAL (#wikimedia-operations) [2022-12-08T14:59:40Z] <hashar> Restarting Gerrit replica TWICE on gerrit2002.wikimedia.org to apply -Dh2.maxCompactTime and get it to trigger compaction # T323754

After restarting Gerrit twice:

 12M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/accounts.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/approvals.h2.db
238M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/change_kind.h2.db
609M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/comment_context.h2.db
 65M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/conflicts.h2.db
187M Nov 17 09:04 /var/lib/gerrit2/review_site/cache/diff.h2.db
184M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/diff_intraline.h2.db
 31M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/diff_summary.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/git_file_diff.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/git_modified_files.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/git_tags.h2.db
2.5M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/groups_byuuid_persisted.h2.db
445M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/mergeability.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/modified_files.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/oauth_tokens.h2.db
 14M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/persisted_projects.h2.db
1.1M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/pure_revert.h2.db
2.2M Jun  8  2018 /var/lib/gerrit2/review_site/cache/quota.repo_size.h2.db
 98M Dec  8 15:01 /var/lib/gerrit2/review_site/cache/web_sessions.h2.db
994M Dec  8 15:01 /var/lib/gerrit2/review_site/db/account_patch_reviews.h2.db

diff.h2.db and quota.repo_size.h2.db did not get touched but all other files got altered. The size differences:

approvals2.1M1.1M
groups_byuuid_persisted3.5M2.5M
mergeability804M445M
persisted_projects19M14M

There is thus a noticeable improvement on the mergeability cache which is great.

Mentioned in SAL (#wikimedia-operations) [2022-12-08T15:12:44Z] <hashar> Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply -Dh2.maxCompactTime and get it to trigger compaction # T323754

On gerrit1001

$ ls -lah /var/lib/gerrit2/review_site/{db/account_patch_reviews,cache/*}.h2.db|cut -b30-
 19M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/accounts.h2.db
610M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/approvals.h2.db
381M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/change_kind.h2.db
779M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/comment_context.h2.db
235M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/conflicts.h2.db
200M Nov 17 09:09 /var/lib/gerrit2/review_site/cache/diff.h2.db
490M Dec  8 15:03 /var/lib/gerrit2/review_site/cache/diff_intraline.h2.db
960M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/diff_summary.h2.db
 12G Dec  8 15:07 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.h2.db
8.2G Dec  8 15:07 /var/lib/gerrit2/review_site/cache/git_file_diff.h2.db
899M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/git_modified_files.h2.db
1.1M Nov 17 11:46 /var/lib/gerrit2/review_site/cache/git_tags.h2.db
5.4M Dec  7 11:48 /var/lib/gerrit2/review_site/cache/groups_byuuid_persisted.h2.db
676M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/mergeability.h2.db
905M Dec  8 15:07 /var/lib/gerrit2/review_site/cache/modified_files.h2.db
1.1M Nov 17 11:47 /var/lib/gerrit2/review_site/cache/oauth_tokens.h2.db
 14M Dec  8 14:07 /var/lib/gerrit2/review_site/cache/persisted_projects.h2.db
1.1M Nov 17 11:46 /var/lib/gerrit2/review_site/cache/pure_revert.h2.db
2.2M Jun  8  2018 /var/lib/gerrit2/review_site/cache/quota.repo_size.h2.db
334M Dec  8 15:08 /var/lib/gerrit2/review_site/cache/web_sessions.h2.db
1.1G Dec  8 15:03 /var/lib/gerrit2/review_site/db/account_patch_reviews.h2.db

After restarting it twice:

7.1M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/accounts.h2.db
313M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/approvals.h2.db
400M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/change_kind.h2.db
820M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/comment_context.h2.db
273M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/conflicts.h2.db
200M Nov 17 09:09 /var/lib/gerrit2/review_site/cache/diff.h2.db
500M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/diff_intraline.h2.db
985M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/diff_summary.h2.db
527M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.h2.db
532M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/git_file_diff.h2.db
149M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/git_modified_files.h2.db
 32K Dec  8 15:12 /var/lib/gerrit2/review_site/cache/git_tags.h2.db
3.8M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/groups_byuuid_persisted.h2.db
511M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/mergeability.h2.db
208M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/modified_files.h2.db
 32K Dec  8 15:12 /var/lib/gerrit2/review_site/cache/oauth_tokens.h2.db
 14M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/persisted_projects.h2.db
 32K Dec  8 15:12 /var/lib/gerrit2/review_site/cache/pure_revert.h2.db
2.2M Jun  8  2018 /var/lib/gerrit2/review_site/cache/quota.repo_size.h2.db
343M Dec  8 15:12 /var/lib/gerrit2/review_site/cache/web_sessions.h2.db
1.1G Dec  8 15:12 /var/lib/gerrit2/review_site/db/account_patch_reviews.h2.db

Again quota.repo_size.h2.db and diff.h2.db have not been touched.

Diff:

accounts.h2.db19M7.1M
approvals.h2.db610M313M
change_kind.h2.db381M400M
comment_context.h2.db779M820M
conflicts.h2.db235M273M
diff.h2.db200M200M
diff_intraline.h2.db490M500M
diff_summary.h2.db960M985M
gerrit_file_diff.h2.db12G527M
git_file_diff.h2.db8.2G532M
git_modified_files.h2.db899M149M
git_tags.h2.db1.1M32K
groups_byuuid_persisted.h2.db5.4M3.8M
mergeability.h2.db676M511M
modified_files.h2.db905M208M
oauth_tokens.h2.db1.1M32K
persisted_projects.h2.db14M14M
pure_revert.h2.db1.1M32K
quota.repo_size.h2.db2.2M2.2M
web_sessions.h2.db334M343M
account_patch_reviews.h2.db1.1G1.1G

The important parts are the big files have dramatically shrink:

  • gerrit_file_diff.h2.db went from 12G to 527M
  • git_file_diff.h2.db went from 8.2G to 532M

Which was exactly the purpose of this task. Solved!