Page MenuHomePhabricator

Branching factors configuration for Blazegraph instances
Open, Needs TriagePublic


Branching factors for Blazegraph journal are configured per namespace and control the width and depth of the indices binary trees.

There are default parameters for branching factors in

# Bump up the branching factor for the lexicon indices on the default kb.
# Bump up the branching factor for the statement indices on the default kb.

In general these are fine, but depending on actual data in the journal and distribution of properties and literals, these might result in not-optimal binary trees, so adjusting them might improve both query and updates performance and also reduce journal size.

This command will prepare RWStore parameters to be used as a replacement for the default values for the NEW journal.

curl -v --silent "http://localhost:9999/bigdata/status?dumpJournal&dumpPages" --stderr - | grep -P "\tBTree" | grep -v "_" | awk -F '\t' '{print "com.bigdata.namespace."$1".com.bigdata.btree.BTree.branchingFactor="$33}'

For WDQS we have adjusted the defaults with Stas, using recommendation from production env. And we might need similar adjustment for SDCQuery after loading production Commons data.

At the time, recommendation for SDCQuery is:

But it might change as we will be getting different data from Commons over time.

These might be rounded up by some threshold to look a little bit more nice, for example 900 or 950 instead of 934.

I believe, that while preparing for new dump load on a new server (or recreating journal on existing one), we should consider adding the check if these parameters got from existing Query Service instance are significantly higher/lower (more than 10%) than the default ones, both for WDQS and SDCQuery services (and probably any other relevant cases where blazegraph is used).