Page MenuHomePhabricator

Branching factors configuration for Blazegraph instances
Open, Needs TriagePublic

Description

Branching factors for Blazegraph journal are configured per namespace and control the width and depth of the indices binary trees.

There are default parameters for branching factors in RWStore.properties:

# Bump up the branching factor for the lexicon indices on the default kb.
com.bigdata.namespace.wdq.lex.com.bigdata.btree.BTree.branchingFactor=400
com.bigdata.namespace.wdq.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=600
com.bigdata.namespace.wdq.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=330
# Bump up the branching factor for the statement indices on the default kb.
com.bigdata.namespace.wdq.spo.com.bigdata.btree.BTree.branchingFactor=1024
com.bigdata.namespace.wdq.spo.OSP.com.bigdata.btree.BTree.branchingFactor=900
com.bigdata.namespace.wdq.spo.SPO.com.bigdata.btree.BTree.branchingFactor=900

In general these are fine, but depending on actual data in the journal and distribution of properties and literals, these might result in not-optimal binary trees, so adjusting them might improve both query and updates performance and also reduce journal size.

This command will prepare RWStore parameters to be used as a replacement for the default values for the NEW journal.

curl -v --silent "http://localhost:9999/bigdata/status?dumpJournal&dumpPages" --stderr - | grep -P "\tBTree" | grep -v "_" | awk -F '\t' '{print "com.bigdata.namespace."$1".com.bigdata.btree.BTree.branchingFactor="$33}'

For WDQS we have adjusted the defaults with Stas, using recommendation from production env. And we might need similar adjustment for SDCQuery after loading production Commons data.

At the time, recommendation for SDCQuery is:
com.bigdata.namespace.wdq.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400
com.bigdata.namespace.wdq.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599
com.bigdata.namespace.wdq.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300
com.bigdata.namespace.wdq.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024
com.bigdata.namespace.wdq.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866
com.bigdata.namespace.wdq.spo.POS.com.bigdata.btree.BTree.branchingFactor=954
com.bigdata.namespace.wdq.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934

But it might change as we will be getting different data from Commons over time.

These might be rounded up by some threshold to look a little bit more nice, for example 900 or 950 instead of 934.

I believe, that while preparing for new dump load on a new server (or recreating journal on existing one), we should consider adding the check if these parameters got from existing Query Service instance are significantly higher/lower (more than 10%) than the default ones, both for WDQS and SDCQuery services (and probably any other relevant cases where blazegraph is used).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 12 2019, 6:24 PM