Assess Wikidata dump import hardware
Open, In Progress, MediumPublic
Actions

Assigned To

Authored By

	dr0ptp4kt
	Mar 4 2024, 3:24 PM

Description

THIS IS A DRAFT

The purpose of this task is to document observed Wikidata dump import characteristics with different hardware, using the graph database Blazegraph that is employed by the Wikidata Query Service. It is also to understand how this performance stacks up against an import into the managed Amazon Web Services ("AWS") offering Amazon Neptune. Blazegraph's company Systap was brought into Amazon several years back.

SUMMARY

A current state of the art cloud compute instance approaches the performance characteristics of a 2018 gaming desktop and 2019 Intel-based MacBook Pro.
NVMes confer a speed advantage for Blazegraph imports relative to SATA SSDs.
CPU clock speed confers a speed advantage for Blazegraph imports.
Amazon Neptune is capable of importing a full February 2024 Wikidata dump dramatically faster than with a standalone cloud compute instance or later generation bare metal server in our data center. Instead of 15-25 days (bare metal in one of our data centers) it takes approximately 63 hours (2.625 days).
There are cost considerations.

NARRATIVE

As noted in T336443: Investigate performance differences between wdqs2022 and older hosts, full Wikidata imports are slower on newer servers provisioned with some later generation processors, in contrast with some older servers with older generation processors. The latest full import appeared to take 25 days.

It's important to address this speed challenge in order to ensure that newer hardware is capable of repopulating Blazegraph quickly in the event of a system failure. Additionally, faster imports also enable us to conduct experiments for graph splitting approaches more quickly - we have been targeting a load time below 10 days, which is achievable for graphs of about 7.6B triples but we believe will likely be difficult to achieve consistently for graphs of 10B triples or more with the newer nodes. Finally, in the event of a catastrophic system failure, we want to understand if the current state of the art, for example as exhibited with Amazon Neptune, may still provide adequate import capabilities.

The following prior art is useful for understanding similar analysis.

More to be ported from text files, Etherpad, etc.

Details

	Subject	Repo	Branch	Lines +/-
	wdqs: move monitoring logic into role declaration	operations/puppet	production	+2 -14
	partman: configure wdqs1025 partioning	operations/puppet	production	+2 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		In Progress		dr0ptp4kt	T359062 Assess Wikidata dump import hardware
		Resolved		VRiley-WMF	T358727 Reclaim recently-decommed CP host for WDQS (see T352253)

Event Timeline

dr0ptp4kt created this task.Mar 4 2024, 3:24 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2024, 3:24 PM

dr0ptp4kt changed the task status from Open to In Progress.Mar 4 2024, 3:28 PM

dr0ptp4kt claimed this task.

dr0ptp4kt triaged this task as Medium priority.

dr0ptp4kt added projects: Wikidata-Query-Service, Discovery-Search (Current work).

dr0ptp4kt updated the task description. (Show Details)

dr0ptp4kt moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

Maintenance_bot added a project: Wikidata.Mar 4 2024, 3:29 PM

dr0ptp4kt updated the task description. (Show Details)Mar 4 2024, 4:17 PM

dr0ptp4kt updated the task description. (Show Details)Mar 4 2024, 4:28 PM

dr0ptp4kt moved this task from Incoming to Current work on the Wikidata-Query-Service board.

dr0ptp4kt removed a project: Wikidata-Query-Service.

dr0ptp4kt updated the task description. (Show Details)Mar 4 2024, 4:32 PM

bking added a subtask: T358727: Reclaim recently-decommed CP host for WDQS (see T352253).Mar 5 2024, 6:19 PM

Change rOPUP100894305575 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] partman: configure wdqs1025 partioning

https://gerrit.wikimedia.org/r/1008943

gerritbot added a project: Patch-For-Review.Mar 5 2024, 7:29 PM

Change rOPUP100894305575 merged by Bking:

[operations/puppet@production] partman: configure wdqs1025 partioning

https://gerrit.wikimedia.org/r/1008943

Maintenance_bot removed a project: Patch-For-Review.Mar 5 2024, 9:31 PM

bking reopened subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Open.Mar 6 2024, 3:13 PM

dr0ptp4kt updated the task description. (Show Details)Mar 6 2024, 9:37 PM

First, adding some commands that were used for Blazegraph imports on Ubuntu 22.04. I had originally tried a good number of EC2 instance types, and then after that went back to focus on just four of them with a sequence of repeatable commands (this wasn't scripted, as I didn't want to spend time automating and also wanted to make sure I got the systems' feedback along the way). I forgot to grab RAM frequency as a routine step when running these commands (I recall checking on one server maybe in the original checks, and did look at my Alienware), but generally servers are DDR4 unless the documentation in AWS says DDR5 (for my 2018 Alienware and 2019 MacBook Pro they're DDR4, BTW).

# get the specs, get the software, ready the mount
lscpu
free -h
lsblk
sudo fdisk /dev/nvme1n1
 n
 p
 1
 ENTER
 ENTER
 w
lsblk
sudo mkfs.ext4 /dev/nvme1n1p1
mkdir rdf
sudo mount -t auto -v /dev/nvme1n1p1 /home/ubuntu/rdf
sudo chown ubuntu:ubuntu rdf
git clone https://gerrit.wikimedia.org/r/wikidata/query/rdf rdfdownload
cp -r rdfdownload/. rdf
cd rdf
df -h .
sudo apt update
sudo apt install openjdk-8-jdk-headless
./mvnw package -DskipTests

# ready Blazegraph and run a partial import
sudo mkdir /var/log/wdqs
sudo chown ubuntu:ubuntu /var/log/wdqs
touch /var/log/wdqs/wdqs-blazegraph.log
cd /home/ubuntu/rdf/dist/target/
tar xzvf service-0.3.138-SNAPSHOT-dist.tar.gz
cd service-0.3.138-SNAPSHOT/
# using logback.xml like prod:
mv ~/logback.xml .
# using runBlazegraph.sh like prod, 31g heap and pointer to logback.xml:
mv ~/runBlazegraph.sh .
vi runBlazegraph.sh
screen
 ./runBlazegraph.sh
CTRL-a-d to leave screen up
time ./loadData.sh -n wdq -d /home/ubuntu/ -s 1 -e 9
screen -r
CTRL-c to kill Blazegraph
 exit from screen
ls -alh wikidata.jnl

# try it with a ramdisk
rm wikidata.jnl
sudo modprobe brd rd_size=50331648 max_part=1 rd_nr=1
sudo mkfs -t ext4 /dev/ram0
mkdir /home/ubuntu/rdfram
sudo mount /dev/ram0 /home/ubuntu/rdfram
sudo chown ubuntu:ubuntu /home/ubuntu/rdfram
cd
cp -r rdf/. rdfram
cd rdfram/dist/target/service-0.3.138-SNAPSHOT/
cp /home/ubuntu/wikidump-* /home/ubuntu/rdfram
df -h ./
screen
 ./runBlazegraph.sh
CTRL-a-d to leave screen up
time ./loadData.sh -n wdq -d /home/ubuntu/rdfram -s 1 -e 9

dr0ptp4kt updated the task description. (Show Details)Mar 7 2024, 12:20 PM

dr0ptp4kt updated the task description. (Show Details)

Change 1009574 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: make "monitoring_tier" var optional

https://gerrit.wikimedia.org/r/1009574

gerritbot added a project: Patch-For-Review.Mar 7 2024, 5:10 PM

Change 1009574 merged by Bking:

[operations/puppet@production] wdqs: move monitoring logic into role declaration

https://gerrit.wikimedia.org/r/1009574

@dr0ptp4kt wdqs1025 should be ready for your I/O tests. Let us know how it goes!

Maintenance_bot removed a project: Patch-For-Review.Mar 7 2024, 11:30 PM

Thanks @bking ! It looks like the NVMe in this one is not a higher speed one for writes (based on the reported model from lsblk I think this is a https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/Dell-PowerEdge-Express-Flash-NVMe-Mixed-Use-PCIe-SSD.pdf ), and I'm also wondering if perhaps its write performance has degraded with age. I'll paste in the results here, but this was slower than the other servers, ironically (although not surprisingly because of the slower NVMe and slightly slower processor). This slower write speed is atypical of the other NVMes I've encountered. I believe the newer model ones are rated for 6000 MB/s for writes. But, I'm going to ping on task to see if we can get a comparative read of disk throughput from one of the newer and faster cp#### NVMes.

dr0ptp4kt@wdqs1025:/srv/deployment/wdqs/wdqs-cache$ ls /srv/wdqs/
aliases.map  wikidata.jnl               wikidump-000000002.ttl.gz  wikidump-000000004.ttl.gz  wikidump-000000006.ttl.gz  wikidump-000000008.ttl.gz
dumps        wikidump-000000001.ttl.gz  wikidump-000000003.ttl.gz  wikidump-000000005.ttl.gz  wikidump-000000007.ttl.gz  wikidump-000000009.ttl.gz
dr0ptp4kt@wdqs1025:/srv/deployment/wdqs/wdqs-cache$ cd cache
dr0ptp4kt@wdqs1025:/srv/deployment/wdqs/wdqs-cache/cache$ time ./loadData.sh -n wdq -d /srv/wdqs -s 1 -e 9
Processing wikidump-000000001.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=214282ms, elapsed=214279ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=233942ms, commitTime=1709910647417, mutationCount=22829952</p
></html
>Processing wikidump-000000002.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=196470ms, elapsed=196469ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=227786ms, commitTime=1709910874952, mutationCount=15807617</p
></html
>Processing wikidump-000000003.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=183111ms, elapsed=183110ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=213965ms, commitTime=1709911089170, mutationCount=12654001</p
></html
>Processing wikidump-000000004.ttl.gz
^C

real    14m4.855s
user    0m0.084s
sys     0m0.053s
dr0ptp4kt@wdqs1025:/srv/deployment/wdqs/wdqs-cache/cache$ cd /srv
dr0ptp4kt@wdqs1025:/srv$ df .
Filesystem      1K-blocks    Used  Available Use% Mounted on
/dev/nvme0n1   1537157352 9508448 1449491832   1% /srv
dr0ptp4kt@wdqs1025:/srv$ sudo sync; sudo dd if=/dev/zero of=tempfile bs=25M count=1024; sudo sync
1024+0 records in
1024+0 records out
26843545600 bytes (27 GB, 25 GiB) copied, 27.1995 s, 987 MB/s
dr0ptp4kt@wdqs1025:/srv$ sudo sync; sudo dd if=/dev/zero of=tempfile bs=25M count=1024; sudo sync
1024+0 records in
1024+0 records out
26843545600 bytes (27 GB, 25 GiB) copied, 37.5448 s, 715 MB/s
dr0ptp4kt@wdqs1025:/srv$ lsblk -o MODEL,SERIAL,SIZE,STATE --nodeps
MODEL                                SERIAL                 SIZE STATE
...
Dell Express Flash PM1725a 1.6TB SFF       S39XNX0JC01060   1.5T

@ssingh would you mind if the following command is run on one of the newer cp#### hosts with a new higher write throughput NVMe? If so, got a recommended node? I don't have access, but I think @bking may.

sudo sync; sudo dd if=/dev/zero of=tempfile bs=25M count=1024; sudo sync

Heads up, I'm out for the rest of the day.

@ssingh @dr0ptp4kt hold up on the testing for on your hosts for now...we might be able to get an NVMe into this year's budget, will let you know.

@dr0ptp4kt If you want to run i/o tests on the existing hosts, I recommend the approach detailed in this wikitech page . Brendan Gregg is considered an expert on computer performance and benchmarking, you might try his approach as well.

bking closed subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Resolved.Mar 19 2024, 6:53 PM

bking mentioned this in T358727: Reclaim recently-decommed CP host for WDQS (see T352253).

AWS EC2 servers

After exploring a battery of EC2 servers, four instance types were selected and the commands posted were run.

The configuration most like our wdqs1021-1023 servers (third generation Intel Xeon) is listed first. The fastest option among the four servers was a Gravitron3 ARM-based configuration from Amazon.

Time Disk ➡️ Disk	Time RAMdisk ➡️ RAMdisk	Instance Type	Cost Per Hour	HD Transfer	Processor Comment	RAM Comment
26m46.651s	26m26.923s	m6id.16xlarge	$3.7968	EBS ➡️ NVMe	64 vCPU @ "Up to 3.5 GHz 3rd Generation Intel Xeon Scalable processors (Ice Lake 8375C)"	256 GB @ DDR4
22m5.442s	20m31.244s	m5zn.metal	$3.9641	EBS ➡️ EBS	48 vCPU @ ""2nd Generation Intel Xeon Scalable Processors (Cascade Lake 8252C) with an all-core turbo frequency up to 4.5 GHz""	192 GiB @ DDR4
21m40.537s	20m57.268s	c5d.12xlarge	$2.304	EBS ➡️ NVMe	48 vCPU @ " C5 and C5d 12xlarge, 24xlarge, and metal instance sizes feature custom 2nd generation Intel Xeon Scalable Processors (Cascade Lake 8275CL) with a sustained all core Turbo frequency of 3.6GHz and single core turbo frequency of up to 3.9GHz."	96 GiB @ DDR4
19m18.825s	19m23.868s	c7gd.16xlarge	$2.903	EBS ➡️ NVMe	64 vCPU @ "Powered by custom-built AWS Graviton3 processors"	128 GiB @ DDR5

2018 gaming desktop

Commands were then run against a a gaming-class desktop from 2018. This outperformed the fastest Gravitron3 configuration in AWS.

The Blazegraph bufferCapacity configuration variable was tested. Increasing the bufferCapacity from 100000 to 1000000 yielded a sizable performance improvement.

Time Disk ➡️ Disk	Instance Type	bufferCapacity	HD Transfer	Processor Comment	RAM Comment
18m31.647s	Alienware Aurora R7 (upgraded) i7-8700	100000	SATA SSD ➡️ NVMe	6 CPU @ up to 4.6 GHz (i7-8700 page)	64 GB @ DDR4
18m3.798s	Alienware Aurora R7 (upgraded) i7-8700	100000	NVMe ➡️ same NVMe	6 CPU @ up to 4.6 GHz (i7-8700 page)	64 GB @ DDR4
15m1.658s	Alienware Aurora R7 (upgraded)	10000000	NVMe ➡️ same NVMe	6 CPU @ up to 4.6 GHz (i7-8700 page)	64 GB @ DDR4
13m28.076s	Alienware Aurora R7 (upgraded)	1000000	NVMe ➡️ same NVMe	6 CPU @ up to 4.6 GHz (i7-8700 page)	64 GB @ DDR4

2019 MacBook Pro

The commands were run against a 2019-era Intel-based MacBook Pro. A 16 GB Java heap was used, as this laptop only has 32 GB of memory.

This was a fast arrangement, but not quite as fast as the gaming-class desktop. The 2019 MacBook Pro has a faster processor, but a somewhat slower disk, than the gaming desktop.

Time Disk ➡️ Disk	Instance Type	bufferCapacity	HD Transfer	Processor Comment	RAM Comment
17m00.040s	2019 MacBook Pro	100000	High-end SSD ➡️ Same SSD (Tom's Hardware)	"Configurable to 2.4GHz 8‑core Intel Core i9, Turbo Boost up to 5.0GHz, with 16MB shared L3 cache"	32 GB @ DDR4
16m27.390s	2019 MacBook Pro	1000000	High-end SSD ➡️ Same SSD (Tom's Hardware)	"Configurable to 2.4GHz 8‑core Intel Core i9, Turbo Boost up to 5.0GHz, with 16MB shared L3 cache"	32 GB @ DDR4

Gaming desktop with MemStore

The following represents an in-memory configuration of the Blazegraph database instead of a .jnl database journal file. It's fast, but do note that the operating system killed the process after completion, apparently due to memory pressure. A higher memory instance would be required in order to load the full database, and again, it appears there isn't an easy way to serialize to disk for later restoration. But this is illustrative of what's possible .

Time Disk ➡️ Disk	Instance Type	bufferCapacity	HD Transfer	Processor Comment	RAM Comment
9m34.808s	Alienware Aurora R7 (upgraded)	100000 w/ MemStore	NVMe ➡️ same NVMe	6 core @ up to 4.6 GHz (i7-8700 page)	64 GB @ DDR4

Queue Capacity

As suggested at https://github.com/blazegraph/database/wiki/IOOptimization the queue capacity was set at 4000 for these runs.

com.bigdata.btree.writeRetentionQueue.capacity=4000

This is probably an okay middle ground value. But, for the purpose of checking out the behavior with this setting, the same command was run, setting the capacity to a value of 8000 (and using a bufferCapacity of 1000000) on the 2019 MacBook Pro. Indeed, it was rather faster as expected, at 12m29.030s, showing promise like in a previous analysis.

This is likely an area for further examination against the 2018 gaming desktop. Soon to follow is some discussion of what was observed for NVMe versus SSD targets on that 2018 gaming desktop.

More about NVMe versus SSD

Runs were also done to see the effects on 150 munged files (out of a set of 2202 files) from the full Wikidata import, which allows for exercising more disk related pieces. This was tried with both types of target disk - SATA SSD and M.2 NVMe - on the 2018 gaming desktop. This was done with the bufferCapacity of 100000.

The M.2 NVMe was faster, somewhere between 16%-19% faster.

Notice in the following commands the paths

~/rdf, which is part of a mount on the NVMe
/mnt/t, which is a copy of ~/rdf, but on a SATA SSD
/mnt/firehose/, yet another SATA SSD, bearing the full set of munged files

Target is NVMe

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 150

...

>Processing wikidump-000000150.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=33999ms, elapsed=33999ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=76005ms, commitTime=1709099819611, mutationCount=3098484</p
></html
>
real    319m50.828s

Target is SATA SSD, run attempt 1

Now, the SATA SSD as the target (as before, the source has been a different SATA SSD).

ubuntu22:/mnt/t/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 150

>Processing wikidump-000000150.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=45665ms, elapsed=45665ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=114606ms, commitTime=1709141576293, mutationCount=3098484</p
></html
>
real    381m19.703s

So, the SATA SSD as target yielded a result about 19% slower.

Target is SATA SSD, run attempt 2

The SATA SSD target was tried this again from the same directory (as always, first stopping Blazegraph and deleting the journal) again just to get a feeling of whether this wasn't a fluke on the SATA SSD.

ubuntu22:/mnt/t/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 150

><body<p>totalElapsed=46490ms, elapsed=46490ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=120472ms, commitTime=1709169683880, mutationCount=3098484</p
></html
>
real    373m52.079s

Still, some 16.5% slower on the SSD.

More about bufferCapacity

Similarly, with 150 munged files, was attempted with the buffer in RWStore.properties increased from 100000 to 1000000 with the target as the NVMe.

com.bigdata.rdf.sail.bufferCapacity=1000000

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 150
...
real    240m5.344s

Remember, for nine munged files the difference in performance for NVMe ➡️ same NVMe between the bufferCapacity of 100000 versus about 1000000 was about 34%. (~1.3412 minus 1.0000), and what we see here for 150 munged files is somewhat consistent at about 33%.

Going for much of the full import

Further import commenced from there with a bufferCapacity of 1000000:

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date
Mon Mar  4 06:31:06 PM CST 2024

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 151 -e 2202
Processing wikidump-000000151.ttl.gz

Munge files 151 through 1333 were processed, but the last file apparently successfully processed was file 1303 at Friday, March 8, 2024 12:07:23 AM CST, suggested by the following stack trace material.

SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001333.ttl.gz>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -153323735397432922: lastRootBlock=rootBlock{ rootBlock=1, challisField=1303, version=3, nextOffset=47326115792378284, localTime=1709878050435 [Friday, March 8, 2024 12:07:30 AM CST], firstCommitTime=1709578180406 [Monday, March 4, 2024 12:49:40 PM CST], lastCommitTime=1709878043961 [Friday, March 8, 2024 12:07:23 AM CST], commitCounter=1303, commitRecordAddr={off=NATIVE:-141569206,len=422}, commitRecordIndexAddr={off=NATIVE:-76760975,len=220}, blockSequence=20886, quorumToken=-1, metaBitsAddr=1981067087054125, metaStartAddr=11989126, storeType=RW, uuid=4afe5c11-044b-4d55-a8f4-a4c90f0faa10, offsetBits=42, checksum=1206986935, createTime=1709578179709 [Monday, March 4, 2024 12:49:39 PM CST], closeTime=0}

So, we have about 4 hours for files 1-150, then another 77.6 hours for files 151-1303. This means about 59% of the full dump (1303 of 2202 files; although the last ) was processed in probably about 3.4 days.

As noted earlier, there may be an opportunity to set the queue capacity higher and squeeze out even better performance. This will need to wait until I'm physically at the gaming class desktop.

Anecdotally, the N3-style Turtle files have a more dramatic slowdown in their import speed when contrasted with simple N-Triples style files, as more and more triples are imported in a bulk ingestion. What we know here is that even with the complicated N3-style Turtle file format it's possible to import around 9B records in less than four days.

About Amazon Neptune

Amazon Neptune was set to import using the simpler N-Triples file format with a serverless configuration at 128 NCUs (about 256 GB of RAM with some attendant CPU). We don't use N-Triples files in our existing import process (we use N3-style Turtle), but it is the sort of format used in the graph split imports.

The first import was just with lexemes to make sure things were working on a 32 NCU configuration.

curl -v -X POST \
    -H 'Content-Type: application/json' \
    https://db-neptune-1.cluster-cnim20k6c0mh.us-west-2.neptune.amazonaws.com:8182/loader -d '
    {
      "source" : "s3://blazegraphdump/latest-lexemes.nt.bz2",
      "format" : "ntriples",
      "iamRoleArn" : "arn:aws:iam::ACCOUNTID:role/NeptuneLoadFromS3",
      "region" : "us-west-2",
      "failOnError" : "FALSE",
      "parallelism" : "HIGH",
      "updateSingleCardinalityProperties" : "FALSE",
      "queueRequest" : "TRUE"
    }'

...

$ curl -G 'https://db-neptune-1.cluster-cnim20k6c0mh.us-west-2.neptune.amazonaws.com:8182/loader/b29f432b-233e-406d-a36d-e3bd0a4ec5bb'
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://bucket/latest-lexemes.nt.bz2",
            "runNumber" : 1,
            "retryNumber" : 3,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 7472,
            "startTime" : 1708643480,
            "totalRecords" : 161015491,
            "totalDuplicates" : 137948,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }

This required a bunch of grants, and I had to make my personal bucket hosting the file listable and readable, as well as the objects listable and readable within it (it's possible to do chained IAM grants, but it is a bit of work and requires somewhat complicated STSes). It appeared that it was also necessary to create a VPC endpoint as described in the documentation.

This was started at 1:30 PM CT on Monday, February 26, 2024. Note that this is the lexemes dump.

Next, I went to verify that with 128 NCUs it goes faster than with 32 NCUs. Because if it did, that would be useful for the bigger dump.

curl -v -X POST \
    -H 'Content-Type: application/json' \
    https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader -d '
    {
      "source" : "s3://blazegraphdump/latest-lexemes.nt.bz2",
      "format" : "ntriples",
      "iamRoleArn" : "arn:aws:iam::ACCOUNTID:role/NeptuneLoadFromS3Attempt",
      "region" : "us-west-2",
      "failOnError" : "FALSE",
      "parallelism" : "OVERSUBSCRIBE",
      "updateSingleCardinalityProperties" : "FALSE",
      "queueRequest" : "TRUE"
    }'


{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "8ace45ed-2989-4fd4-aa19-d13b9a59e824"
    }

curl -G 'https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader/8ace45ed-2989-4fd4-aa19-d13b9a59e824'


{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://blazegraphdump/latest-lexemes.nt.bz2",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 2142,
            "startTime" : 1708975752,
            "totalRecords" : 163715491,
            "totalDuplicates" : 141148,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

It was faster, but that may have had something to do with the earlier 32 NCU configuration having to re-run (successfully) and the 128 NCU not having to automatically re-run, as well as the use of the "parallelism" : "OVERSUBSCRIBE", setting for the 128 NCU use case. OVERSUBSCRIBE is okay and even beneficial for an ingestion like this.

Now, for the full Wikidata load. This was started at about 2:20 PM CT on Monday, February 26, 2024.

curl -v -X POST \
    -H 'Content-Type: application/json' \
    https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader -d '
    {
      "source" : "s3://blazegraphdump/latest-all.nt.bz2",
      "format" : "ntriples",
      "iamRoleArn" : "arn:aws:iam::ACCOUNTID:role/NeptuneLoadFromS3Attempt",
      "region" : "us-west-2",
      "failOnError" : "FALSE",
      "parallelism" : "OVERSUBSCRIBE",
      "updateSingleCardinalityProperties" : "FALSE",
      "queueRequest" : "TRUE"
    }'

{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "54dc9f5a-6e3c-428d-8897-180e10c96dbf"
    }

As a frame of reference, over 9B records imported in a a bit over 26 hours. This is in the ballpark of what's discussed in https://hal.science/hal-03132794/document , and remember this is serverless and is probably using lower clock speed vCPU and possibly lower rated RAM. A higher powered provisioned instance is likely to go faster.

curl -G 'https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader/54dc9f5a-6e3c-428d-8897-180e10c96dbf'

{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_IN_PROGRESS" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://blazegraphdump/latest-all.nt.bz2",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_IN_PROGRESS",
            "totalTimeSpent" : 101809,
            "startTime" : 1708978871,
            "totalRecords" : 9312600000,
            "totalDuplicates" : 145923094,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

I let this continue running. Following was the final outcome - 63 hours for this very full dump.

ubuntu@ip-172-31-23-68:~$ curl 'https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader/54dc9f5a-6e3c-428d-8897-180e10c96dbf'
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://blazegraphdump/latest-all.nt.bz2",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 227384,
            "startTime" : 1708978871,
            "totalRecords" : 19558358339,
            "totalDuplicates" : 341740320,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }

This style of query worked fine, although query caching and pre-warmup would be advisable through other means if ever running such a service in the real world.

curl -X POST --data-binary 'query=select ?s ?p ?o where {?s ?p ?o} limit 10' 'https://db-neptune-1-instance-1.cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/sparql'

Next, we tryed a provisioned instance using a high powered db.x2iedn.24xlarge - 128 vCPU and 4,096 GB of RAM according to its dropdown in the Create Database dialog (n.b., its info panel sayid the following - Instance class db.x2iedn.24xlarge Current generation vCPU 96Memory 3072 GiB; indeed, this is what the db.x2iedn.24xlarge is said to be at https://aws.amazon.com/ec2/instance-types/ ). This is not the sort of machine we'd run for more than an import as the costs would be prohibitive, but it is worth exploring. https://docs.aws.amazon.com/neptune/latest/userguide/best-practices-general-basic.html#best-practices-loader-tempinstance describes how to go about using a more powerful import machine, which is mostly about ingesting faster (it's unclear if there's much of a cost difference).

After deletion of the serverless based graph database data, the run with the provisioned instance (non-serverless, but still a managed service) was started at about 2:39 PM on February 29, 2024 (note the same VPC endpoint, but keep in mind it's a different server / cluster) :

curl -v -X POST \
    -H 'Content-Type: application/json' \
    https://db-neptune-2.cluster-cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader -d '
    {
      "source" : "s3://blazegraphdump/latest-all.nt.bz2",
      "format" : "ntriples",
      "iamRoleArn" : "arn:aws:iam::ACCOUNTID:role/NeptuneLoadFromS3Attempt",
      "region" : "us-west-2",
      "failOnError" : "FALSE",
      "parallelism" : "OVERSUBSCRIBE",
      "updateSingleCardinalityProperties" : "FALSE",
      "queueRequest" : "TRUE"
    }'


{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "75b910c5-25d5-4257-8e9f-8a91f8c2df88"
    }


watch -n 60 curl -G 'https://db-neptune-2.cluster-cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader/75b910c5-25d5-4257-8e9f-8a91f8c2df88'

After running for a while, we just stopped it. At this pace it would take about 14 hours to import 9 billion triples. That would be faster than serverless, but not so much so as to justify continuing with this particular import (it may be worth it if there's time pressure). CPU utilization when using this powerful provisioned node was hovering around 20%, which is lower than expected. Keep in mind that serverless was using 50%-70+% once it got going.

Here was the point where we stopped with this powerful managed server approach.

ubuntu@ip-172-31-23-68:~$ curl 'https://db-neptune-2.cluster-cwnhpfsf87ne.us-west-2.neptune.amazonaws.com:8182/loader/75b910c5-25d5-4257-8e9f-8a91f8c2df88'{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_IN_PROGRESS" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://blazegraphdump/latest-all.nt.bz2",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_IN_PROGRESS",
            "totalTimeSpent" : 7720,
            "startTime" : 1709239199,
            "totalRecords" : 1311570000,
            "totalDuplicates" : 8806377,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }

Although there was some level of storage cost, there was also an allowance for Amazon Neptune serverless - up to 750 free compute hours are available for an account in the first month of use of Amazon Neptune serverless (this appeared to be fanned out across multiple servers' NCUs somehow).

The policies for this sort of usage may be subject to change, so it's important that any attempt to replicate pay close attention to hourly billing and potential case budgeting is considered (e.g., up to $500/day for import).

Needs post an import could vary dramatically from use case to use case, ranging from comparatively affordable for such a big database to comparatively more expensive.

It is possible to migrate workloads and adjust configuration after import to better contain costs.

Possible approaches for loading may involve

Use of the technique in https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-lookup-cache.html for a managed server (this may generalize across other NVMe-connected instance types) if needing to support relatively warm lookup cache facilities immediately.
Alternatively, serverless import could be used and then the installation could be switched over to a managed server that supports the lookup cache, letting the cache gradually warm up.

Use of configuration parameters as described in https://docs.aws.amazon.com/neptune/latest/userguide/parameters.html would be useful for things like the lookup cache, as well as autoscaling configuration, query engine, and so on.

Attempting a run with a queue capacity of 8000 and buffer of 1000000 and heap size of 16g on the gaming-class desktop to mimic the MacBook Pro, things were slower than a queue capacity of 4000 and buffer of 1000000 and heap size of 31g on the gaming-class desktop.

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 150
...
real    280m46.264s

A run is in progress to verify if there's anything noticeable when the heap size is set to 31g but the queue capacity is at 8000 and the buffer is at 1000000 when processing the first 150 files on the gaming-class desktop.

The run to check with heap size of 31g, queue capacity of 8000, and buffer at 1000000 stalled at file 107.

By the way, I'm attempting a run for the first 1332 munged files (one shy of the 1333 where terminated last time around) with buffer at 1000000 and heap size at 31g and queue capacity at 4000 on the gaming-class desktop to see whether this imports smoothly and whether performance gains are noticeable.

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date
Wed Mar 20 02:36:59 PM CDT 2024
ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 1332

...screen'ing in to check:

Processing wikidump-000000505.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=13452ms, elapsed=13452ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=167329ms, commitTime=1711041930967, mutationCount=4566497</p
></html
>Thu Mar 21 12:25:35 PM CDT 2024
Processing wikidump-000000506.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=15405ms, elapsed=15405ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=203202ms, commitTime=1711042135111, mutationCount=5262167</p
></html
>Thu Mar 21 12:28:58 PM CDT 2024
Processing wikidump-000000507.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=14701ms, elapsed=14700ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=178754ms, commitTime=1711042314114, mutationCount=5005853</p
></html
>Thu Mar 21 12:31:57 PM CDT 2024

The run with with buffer at 1000000 and heap size at 31g and queue capacity at 4000 on the gaming-class desktop completed.

Processing wikidump-000001332.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=13580ms, elapsed=13580ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=266483ms, commitTime=1711304860167, mutationCount=4772590</p
></html
>Sun Mar 24 01:27:45 PM CDT 2024

real    5690m30.371s

... which is 3.95 days. I'm trying again, but going back to the buffer capacity at 100000 instead of 1000000 for one last comparison with these runs on this subset of munged data, and without any larger pause between batches of files (remember the previous run with buffer capacity at 100000 and 31g heap and queue capacity at 4000 was done by first running files 1-150, then after coming back to the terminal sometime later resumed from file 151; but in the real world we usually hope to just let this thing run one file after another without any pause...in practice it could be that the time between files 150 and 151 allowed the JVM more time to heal itself and created some artificial speed gains when adding together the two time intervals, but we'll see).

Starting on Friday, March 29, 2024 at 1:40 PM CT...

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ time ./loadData.sh -n wdq -d /mnt/firehose/munge_on_later_data_set -s 1 -e 1332
Processing wikidump-000000001.ttl.gz

I'll update when it's done. It should complete presumably sometime in the next 24 hours.

Now this is interesting: we're now past 4 days (about 4 days and 1 hour) of this running, and with buffer capacity at 100000 instead of 1000000 (but this time without any gap between the batches of files), there's still a good way to go yet.

Processing wikidump-000001177.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=612796ms, elapsed=612796ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=689208ms, commitTime=1712085811545, mutationCount=12297407</p
></html
>Tue Apr  2 02:23:35 PM CDT 2024
Processing wikidump-000001178.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=850122ms, elapsed=850121ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=950693ms, commitTime=1712086762086, mutationCount=16659867</p
></html
>Tue Apr  2 02:39:26 PM CDT 2024
Processing wikidump-000001179.ttl.gz

It's possible this means that a higher buffer capacity actually makes a difference. I will let this run complete so we can see what is the percentage difference.

After this I will check if this sort of behavior is reproducible, and to what extent, with one side of the graph split when using these two different buffer sizes.

This morning of April 3 around 6:25 AM I had SSH'd to check progress, and it was working, but going slowly, similar to the day before. It was on a file number in the 1200s, but I didn't write down the number or copy terminal output; I do remember seeing it was taking around 796 seconds for one of the files at that time. Look at the previous comment, you'll see those were going slow; not surprising as we know imports on these munged files are slower upon more stuff is imported.

I checked several hours later in the middle of a meeting, and it had gone into a bad spiral.

I've been able to use screen backscrolling to obtain much of the stack trace, but could not backscroll to a point of having all of the information to tell when the last successful file imported without a stack trace for sure. What we can say is that probably the last somewhat stable commit was on file 1302 at about 7:24 AM. And probably file 1303 and definitely 1304 and 1305 have been failing badly and taking a really long time in doing so; this would probably continue indefinitely from here without killing the process. Just a slice of the paste here to give an idea of things (notice lastCommitTime and commitCounter in the stack trace).

Wed Apr  3 02:05:26 PM CDT 2024
Processing wikidump-000001305.ttl.gz
SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001305.ttl.gz>
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem 
with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne
sday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024
 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555,
 quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356
92, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0}

Unfortunately jstack seems to hiccup.

ubuntu22:~$ sudo jstack -m 987870
[sudo] password: 
Attaching to process ID 987870, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.402-b06
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
	at sun.tools.jstack.JStack.main(JStack.java:106)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
	at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169)
	at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153)
	at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:71)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:58)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:53)
	at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
	at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
	at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
	at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
	... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fecb400b800
	at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
	at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
	at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165)
	... 16 more
ubuntu22:~$ sudo jstack -Flm 987870
Usage:
    jstack [-l] <pid>
        (to connect to running process)
    jstack -F [-m] [-l] <pid>
        (to connect to a hung process)
    jstack [-m] [-l] <executable> <core>
        (to connect to a core file)
    jstack [-m] [-l] [server_id@]<remote server IP or hostname>
        (to connect to a remote debug server)

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message
ubuntu22:~$ sudo jstack -F -l -m 987870
Attaching to process ID 987870, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.402-b06
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
	at sun.tools.jstack.JStack.main(JStack.java:106)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
	at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169)
	at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153)
	at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:71)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:58)
	at sun.jvm.hotspot.tools.PStack.run(PStack.java:53)
	at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
	at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
	at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
	at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
	... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fecb400b800
	at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
	at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
	at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165)
	... 16 more

The number of files imported between the previous comment at 2:39 PM yesterday and 7:24 AM today is about what we might expect if we were for example assuming about 10 minutes or less per file in keeping with the trajectory observed.

(1302-1179+1)*10/60

That is to say, we would have expected an additional 20.6 hours up to file 1302 at ten minutes per file, but it was actually about an additional 16.9 hours.

Blazegraph was still running, so I took this opportunity to check how many triples were in place.

select (count(*) as ?ct)
where { ?s ?p ?o }

produced a result of 9082617267.

For the sake of extrapolation, let's suppose then that it would take about 8.1 minutes per remaining file and things would have kept working all the way up to 1332 files (with 1302 being the last good one). 30 files times 8.1 minutes per file = 243 minutes, or 4 hours and 3 minutes. If this had been the case, this would put us at about 11:27 AM today for completion. Remember, this was started around Friday, March 29, 2024 at 1:40 PM CT. So we're really quite close to a projection of about 4.90 days with this lower buffer capacity setting. Given that the other approach took about 3.95 days , from this run we might say that this run would be about 24% (1.24 minus 1) times longer than the other approach.

This sort of instability is unfortunately proving to be common (indeed, it's part of the reason for the graph split to have fewer triples and to reduce risk), but we may have stumbled onto something with the buffer capacity that helps us get both greater stability and greater speed, at least up to a certain number of munged files and at least on this machine without any pauses between files.

As I said earlier, I'm going to try comparing buffer capacity pieces against graph split files and see how things stack up. Graph split based files is ultimately our target, after all.

Knowing that at least on this machine the larger buffer capacity seems to likely be associated with greater stability (we cannot say this without qualification given an issue in an earlier run, though), we should probably try a full reload on another server with the munged N3-style Turtle files with the larger buffer capacity as well and see what this does for a full graph import. Or at least be aware that this is an option in case we need to do a full import (e.g., we might try it on one node with the higher buffer capacity in parallel with any other nodes attempting to recover with the currently established buffer capacity).

Here's the paste of what stack trace material I could screen backscroll:

P59389 BlazeGraph stack trace 2 3 4 5 6 7 8Caused 9 10 11 12 13 14 15 16 17Caused 18CacheDebug: 19 20 21 22 23 24 25 26 27 28 29 30Caused 31 32 33Caused 34 35 36 37 38 39 40 41 42 43 44 45 46SPARQL-UPDATE: 47j 48with 49sday, 50 51 5292, 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111Caused 11202: 113 114r=1302, 115 116Friday, 117 118 119 120 121 122 123 124 125 126Caused 127 128day, 129 130, 131 132 133 134 135 136 137 138 139Caused 140 141 142 143 144 145 146 147 148 149 150 151 152Caused 153 154 155 156 157 158 159 160 161Caused 162CacheDebug: 163 164 165 166 167 168 169 170 171 172 173 174Caused 175 176 177Caused 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192with 193sday, 194 195 19692, 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255Caused 25602: 257 258r=1302, 259 260Friday, 261 262 263 264 265 266 267 268 269 270Caused 271 272day, 273 274, 275 276 277 278 279 280 281 282 283 284Caused 285 286 287 288 289 290 291 292 293 294 295 296 297Caused 298 299 300 301 302 303 304 305 306Caused 307CacheDebug: 308 309 310 311 312 313 314 315 316 317 318 319Caused 320 321 322Caused 323 324 325 326 327 328 329 330 331 332 333 334 335Wed Apr 336Processing 337 338 339with 340sday, 341 342 34392, 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402Caused 40302: 404 405r=1302, 406 407Friday, 408 409 410 411 412 413 414 415 416 417Caused 418 419day, 420 421, 422 423 424 425 426 427 428 429 430Caused 431 432 433 434 435 436 437 438 439 440 441 442 443Caused 444 445 446 447 448 449 450 451 452Caused 453CacheDebug: 454 455 456 457 458 459 460 461 462 463 464 465Caused 466 467 468Caused 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483with 484sday, 485 486 48792, 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546Caused 54702: 548 549r=1302, 550 551Friday, 552 553 554 555 556 557 558 559 560 561Caused 562 563day, 564 565, 566 567 568 569 570 571 572 573 574Caused 575 576 577 578 579 580 581 582 583 584 585 586 587Caused 588 589 590 591 592 593 594 595 596Caused 597 598 599 600 601 602 603 604 605 606 607 608Caused 609 610 611Caused 612 613 614 615 616 617 618 619 620 621 622 623 during munged file import, obtained from screen backscroll

at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99) at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3932) at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:330) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:375) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:321) at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1072) ... 9 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002 at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425) at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3896) at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:780) at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3500) at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3301) at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4111) at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3132) ... 15 more by: java.lang.RuntimeException: addr=-151834637 : cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, write No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2120) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2685) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2700) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5341) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5411) ... 21 more by: java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321) ... 30 more by: com.bigdata.util.ChecksumError: offset=704140099584,nbytes=16,expected=-154356869,actual=543360410 at com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783) at com.bigdata.io.writecache.WriteCacheService._getRecord(WriteCacheService.java:3598) at com.bigdata.io.writecache.WriteCacheService.access$2500(WriteCacheService.java:200) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3435) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3419) at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92) at com.bigdata.io.writecache.WriteCacheService.loadRecord(WriteCacheService.java:3540) at com.bigdata.io.writecache.WriteCacheService.read(WriteCacheService.java:3259) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2315) ... 30 more updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001304.ttl.gz> ava.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356 createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:460) at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:241) at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:269) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:195) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) by: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -832899127695110 lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounte commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr= 26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [ March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:571) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:472) at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more by: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Fri March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140 859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126 storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], c loseTime=0} at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1080) at com.bigdata.rdf.sail.BigdataSailUpdate.execute2(BigdataSailUpdate.java:152) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$UpdateTask.doQuery(BigdataRDFContext.java:2003) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1579) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1544) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:757) ... 4 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47 806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCom mitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off= NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434 -d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3134) at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99) at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3932) at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:330) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:375) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:321) at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1072) ... 9 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002 at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425) at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3896) at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:780) at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3500) at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3301) at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4111) at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3132) ... 15 more by: java.lang.RuntimeException: addr=-151834637 : cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, write No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2120) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2685) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2700) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5341) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5411) ... 21 more by: java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321) ... 30 more by: com.bigdata.util.ChecksumError: offset=704140099584,nbytes=16,expected=-154356869,actual=543360410 at com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783) at com.bigdata.io.writecache.WriteCacheService._getRecord(WriteCacheService.java:3598) at com.bigdata.io.writecache.WriteCacheService.access$2500(WriteCacheService.java:200) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3435) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3419) at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92) at com.bigdata.io.writecache.WriteCacheService.loadRecord(WriteCacheService.java:3540) at com.bigdata.io.writecache.WriteCacheService.read(WriteCacheService.java:3259) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2315) ... 30 more SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001304.ttl.gz> java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356 createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:460) at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:241) at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:269) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:195) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) by: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -832899127695110 lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounte commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr= 26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [ March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:571) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:472) at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more by: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Fri March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140 859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126 storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], c loseTime=0} at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1080) at com.bigdata.rdf.sail.BigdataSailUpdate.execute2(BigdataSailUpdate.java:152) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$UpdateTask.doQuery(BigdataRDFContext.java:2003) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1579) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1544) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:757) ... 4 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47 806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCom mitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off= NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434 -d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3134) at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99) at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3932) at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:330) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:375) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:321) at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1072) ... 9 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002 at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425) at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3896) at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:780) at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3500) at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3301) at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4111) at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3132) ... 15 more by: java.lang.RuntimeException: addr=-151834637 : cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, write No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2120) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2685) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2700) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5341) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5411) ... 21 more by: java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321) ... 30 more by: com.bigdata.util.ChecksumError: offset=704140099584,nbytes=16,expected=-154356869,actual=543360410 at com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783) at com.bigdata.io.writecache.WriteCacheService._getRecord(WriteCacheService.java:3598) at com.bigdata.io.writecache.WriteCacheService.access$2500(WriteCacheService.java:200) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3435) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3419) at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92) at com.bigdata.io.writecache.WriteCacheService.loadRecord(WriteCacheService.java:3540) at com.bigdata.io.writecache.WriteCacheService.read(WriteCacheService.java:3259) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2315) ... 30 more 3 02:05:26 PM CDT 2024 wikidump-000001305.ttl.gz SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001305.ttl.gz> java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356 createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:460) at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:241) at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:269) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:195) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) by: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -832899127695110 lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounte commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr= 26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [ March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:571) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:472) at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more by: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Fri March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140 859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126 storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], c loseTime=0} at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1080) at com.bigdata.rdf.sail.BigdataSailUpdate.execute2(BigdataSailUpdate.java:152) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$UpdateTask.doQuery(BigdataRDFContext.java:2003) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1579) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1544) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:757) ... 4 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47 806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCom mitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off= NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434 -d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3134) at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99) at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3932) at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:330) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:375) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:321) at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1072) ... 9 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002 at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425) at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3896) at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:780) at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3500) at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3301) at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4111) at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3132) ... 15 more by: java.lang.RuntimeException: addr=-151834637 : cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, write No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2120) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2685) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2700) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5341) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5411) ... 21 more by: java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321) ... 30 more by: com.bigdata.util.ChecksumError: offset=704140099584,nbytes=16,expected=-154356869,actual=543360410 at com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783) at com.bigdata.io.writecache.WriteCacheService._getRecord(WriteCacheService.java:3598) at com.bigdata.io.writecache.WriteCacheService.access$2500(WriteCacheService.java:200) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3435) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3419) at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92) at com.bigdata.io.writecache.WriteCacheService.loadRecord(WriteCacheService.java:3540) at com.bigdata.io.writecache.WriteCacheService.read(WriteCacheService.java:3259) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2315) ... 30 more SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001305.ttl.gz> java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356 createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:460) at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:241) at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:269) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:195) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) by: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -832899127695110 lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounte commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr= 26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [ March 29, 2024 1:39:34 PM CDT], closeTime=0} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:571) at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:472) at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more by: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Fri March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140 859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126 storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], c loseTime=0} at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1080) at com.bigdata.rdf.sail.BigdataSailUpdate.execute2(BigdataSailUpdate.java:152) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$UpdateTask.doQuery(BigdataRDFContext.java:2003) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1579) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1544) at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:757) ... 4 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47 806576684846562, localTime=1712147044389 [Wednesday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCom mitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off= NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=1600335692, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3134) at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99) at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3932) at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:330) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:375) at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:321) at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1072) ... 9 more by: java.lang.RuntimeException: Problem with entry at -83289912769511002 at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5425) at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3896) at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:780) at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3500) at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3301) at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4111) at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3132) ... 15 more by: java.lang.RuntimeException: addr=-151834637 : cause=java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2399) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2120) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2685) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeImmediateBlob(RWStore.java:2700) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2725) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2712) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5341) at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:5411) ... 21 more by: java.lang.IllegalStateException: Error reading from WriteCache addr: 704140099584 length: 12, writeCacheDebug: No WriteCache debug info at com.bigdata.rwstore.RWStore.getData(RWStore.java:2321) ... 30 more by: com.bigdata.util.ChecksumError: offset=704140099584,nbytes=16,expected=-154356869,actual=543360410 at com.bigdata.io.writecache.WriteCacheService._readFromLocalDiskIntoNewHeapByteBuffer(WriteCacheService.java:3783) at com.bigdata.io.writecache.WriteCacheService._getRecord(WriteCacheService.java:3598) at com.bigdata.io.writecache.WriteCacheService.access$2500(WriteCacheService.java:200) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3435) at com.bigdata.io.writecache.WriteCacheService$1.compute(WriteCacheService.java:3419) at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92) at com.bigdata.io.writecache.WriteCacheService.loadRecord(WriteCacheService.java:3540) at com.bigdata.io.writecache.WriteCacheService.read(WriteCacheService.java:3259) at com.bigdata.rwstore.RWStore.getData(RWStore.java:2315) ... 30 more

Following roughly the procedure in P54284 to rename the Spark-produced graph files (and updating loadData.sh with FORMAT=part-%05d-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz and still having a date call after each curl in it), I kicked off an import of the scholarly article entity graph like so to see how it goes with a buffer capacity of 100000:

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_schol -s 0 -e 0 2>&1 | tee loadData.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_schol 2>&1 | tee -a loadData.log
Wed Apr  3 09:32:54 PM CDT 2024
Processing part-00000-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=55629ms, elapsed=55584ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=61598ms, commitTime=1712198035155, mutationCount=7349689</p
></html
>Wed Apr  3 09:33:56 PM CDT 2024

real    1m1.702s
user    0m0.004s
sys     0m0.006s
Processing part-00001-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=61251ms, elapsed=61251ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=71925ms, commitTime=1712198106800, mutationCount=7774048</p
></html
>Wed Apr  3 09:35:08 PM CDT 2024
Processing part-00002-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz

This is with the following values in RWStore.properties

com.bigdata.btree.writeRetentionQueue.capacity=4000
com.bigdata.rdf.sail.bufferCapacity=100000

and the following variable in loadData.sh

HEAP_SIZE=${HEAP_SIZE:-"31g"}

bking mentioned this in T327689: Use rsync instead of NFS for wdqs data reload cookbook.Apr 4 2024, 4:57 PM

Just updating on how far along this run is, file 550 of the scholarly article entity side of the graph is being processed. There are files 0 through 1023 for this side of the graph. Note that I did think to tee output this time around so that generally/hopefully there's more info available to review output, stack traces (although hopefully there are none), and so on, should it be needed.

Processing part-00549-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=299675ms, elapsed=299675ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=392531ms, commitTime=1712329890306, mutationCount=7032172</p
></html
>Fri Apr  5 10:11:32 AM CDT 2024
Processing part-00550-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz

Sidebar: the "non"-scholarly article entity graph also has files 0-1023 and is similarly sized in terms of triples, but naturally the manner in which nodes are interconnected varies in a sense because of the type of entities, what kind of data entities are imbued with, and so on.

Update. On the gaming-class machine it took about 3.25 days to import the scholarly article entity graph, using a buffer capacity of 100000 (compare this with 5.875 days on wdqs1024). This resulted in 7_643_858_078 triples as expected. Next up will be with a buffer capacity of 1000000 to see if there is any obvious difference in import time.

>Sun Apr  7 03:34:59 AM CDT 2024
Processing part-01023-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=181901ms, elapsed=181901ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=226511ms, commitTime=1712479122009, mutationCount=7946575</p
></html
>Sun Apr  7 03:38:46 AM CDT 2024
File /mnt/firehose/split_0/nt_wd_schol/part-01024-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz not found, terminating

real    4684m49.905s

With bufferCapacity at 1000000, kicked it off again with the scholarly article entity graph files:

ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date | tee loadData.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_schol -s 0 -e 0 2>&1 | tee -a loadData.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_schol 2>&1 | tee -a loadData.log
Sun Apr  7 12:03:19 PM CDT 2024
Processing part-00000-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz

Update: With the buffer capacity at 1000000, file number 550 of the scholarly graph was imported as of Mon Apr 8 03:22:08 PM CDT 2024 . So, under 28 hours so far (buffer capacity at 100000 was more than 36 hours).

Processing part-00550-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=51018ms, elapsed=51018ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=245278ms, commitTime=1712607725882, mutationCount=7414497</p
></html
>Mon Apr  8 03:22:08 PM CDT 2024

Will update when it completes.

Good news. With the N-triples style scholarly entity graph files, with a buffer capacity of 1000000, a write retention queue capacity of 4000, and a heap size of 31g, on the gaming-class desktop, it took about 2.40 days. Recall that with buffer capacity of 100000 it took about 3.25 days on this desktop (and again, recall that it was 5.875 days on wdqs1024). So, there was about a 35% (1.35 minus 1) speed increase with the higher buffer capacity here on this gaming-class desktop.

It appears then that the combination of faster CPU, NVMe, and a higher buffer capacity is somewhere around 144% (5.875 / 2.40 = 2.44, 2.44 minus 1 = 1.44) faster than what we observed on a target data center machine.

It will likely be somewhat less dramatic on 10B triples if the previous munged file runs are any clue. I'm going to think on how to check this notion - it could be done by using the scholarly graph plus a portion of the main graph, which would be probably close enough for our purposes.

A high speed NVMe is in the process of being acquired so that we can verify on wdqs2024 the level of speedup achieved on a server similar to what was used for the graph split test servers. wdqs2024 has a hardware profile similar to wdqs1024 at present.

Some stuff from the terminal from the import on the gaming-class desktop:

ubuntu22:~$ head -9 ~/rdf/dist/target/service-0.3.138-SNAPSHOT/loadData.log
Sun Apr  7 12:03:19 PM CDT 2024
Processing part-00000-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=64069ms, elapsed=64024ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=71897ms, commitTime=1712509470732, mutationCount=7349689</p
></html
>Sun Apr  7 12:04:31 PM CDT 2024
Processing part-00001-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz

# screen output at the end:

Processing part-01023-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
></head
><body<p>totalElapsed=51703ms, elapsed=51703ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
><hr><p>COMMIT: totalElapsed=181013ms, commitTime=1712716306763, mutationCount=7946575</p
></html
>Tue Apr  9 09:31:50 PM CDT 2024
File /mnt/firehose/split_0/nt_wd_schol/part-01024-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz not found, terminating

real    3447m18.542s

dr0ptp4kt mentioned this in T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors).Apr 18 2024, 6:17 PM

In T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors) we saw that this took about 3702 minutes, or about 2.57 days, for the scholarly article entity with the CPU governor change (described in T336443#9726600 ) alone on wdqs2023.

And for the second run in T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors) we saw that this took about 3089 minutes, or about 2.15 days, for the scholarly article entity graph with the CPU governor change (described in T336443#9726600 ) plus the bufferCapacity at 1000000 on wdqs2023.

On the gaming-class 2018 desktop, although the bufferCapacity value at 1000000 sped things up as described on this here ticket, application of the CPU governor change did not seem to have any additional bearing (it took 2.47 days as compared to its previous record of 2.44). It's possible that the existing BIOS configuration of the gaming-class 2018 desktop (which was already set to a high performance mode) was already squeezing out optimal performance, for example, or something else about the processor architecture's interaction with the rest of the hardware and operating system is just different as contrasted with the data center server. In any case, it's nice to see that the data center server is faster!

One of my theories is that the gaming class desktop with 64GB of total RAM may play some role, but the hardware provider has indicated that although more memory can be installed, it will only run with 64GB RAM and can't jump to 128GB RAM. Another is that perhaps the default memory swappiness (60) on the gaming class desktop could play a role. However, I find this less likely, as memory spikes haven't seemed to be a problem on this machine while loading data, plus the hard drive is an NVMe and so paging is somewhat less likely to manifest problematically anyway. Maybe something to check another day, as we use a swappiness of 0 in the data center generally (including the WDQS hosts).

I remembered I should post this:

For the "main" graph load on the desktop gaming-class 2018 desktop with bufferCapacity at 1000000:

Start time: Tue May 28 04:36:30 PM CDT 2024
Finish time: Fri May 31 03:43:25 AM CDT 2024

The first file (ends with 0000) was reported with the following time:

totalElapsed=71424ms

The remaining files (from 00001 to the end - part-01023-c261bb68-4091-4613-ae52-88ce97d22c14-c000.ttl.gz) were timed as follows:

real 3545m44.021s

So, 2.463 days.

The command used was

date | tee loadDataMAIN.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_main -s 0 -e 0 2>&1 | tee -a loadDataMAIN.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_main 2>&1 | tee -a loadDataMAIN.log

Gehel moved this task from In Progress to Ready for Dev -- SWE on the Discovery-Search (Current work) board.Jun 24 2024, 3:17 PM

Gehel moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.Mon, Jul 22, 3:26 PM

Assess Wikidata dump import hardwareOpen, In Progress, MediumPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Assess Wikidata dump import hardware
Open, In Progress, MediumPublic
Actions

Related Objects
Search...