Tue, Nov 19
iostat -x 1
Are there thread dumps from Blazegraph available?
What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? How many per minute/hour?
Mon, Nov 18
Thanks! Yes it is Wikidata-Query-Service
Thanks, yes it is Wikidata-Query-Service
Wed, Nov 13
Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are actually used (~5% of space unused).
While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used (~47% of space unused).
Most of the %FileWaste or reserved for 8K allocators, but %SlotWaste is also higher than usual for 4k (10 times higher than usual), 2k, 64 (3 times), 320 and 768 allocators (2 times).
Wed, Oct 23
Added link to the task T236251: Add header returning time millis to first solution similar to TTFB measured in Blazegraph.
The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzing long-running queries and also comparing queries performance. If the time reported by Blazegraph is significantly less than total time of the query execution, it might be caused by:
- Total result is very large one, and it has consumed much time on serialization/deserialization (that is basically OK situation, if the number of results are large)
- Some connectivity issues, over network and/or inter-process. In this case the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but total query time vary over time.
- Query might be very unselective, but additional constraints filter out many potential solutions, so the first solution is computed fast but to collect all the asked results it takes much time. Such queries are subject to analysis and might need fixing in the Blazegraph code or data layout.
Oct 16 2019
The LabelService optimizer was fixed (so it will not throw NPEs) this August, by reusing Blazegraph core utility com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an introspection on variables used in filters and other clauses, so LabelService call placement could be properly adjusted, this introspection seems to come into infinite loop over the AST tree. Vars reuse to label aggregation after the original var is a common practice, so, yes it should be fixed. Looking on the workaround to extract referenced vars without catching into the infinite loop.
Oct 9 2019
Oct 7 2019
There is a context param queryTimeout set to 10 minutes in web.xml, which is applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you might apply it locally (or just edit web.xml file) to resolve your issue, as the change has not been applied to the WDQS master due to this timeout is system-wide and extending it might result in unexpected consumption of resources (this timeout will be also applied to queries, including very heave ones, thus allowing them running much longer before generating timeout).
Sep 30 2019
These characters are indeed mapped to the same term in the DB.
Sep 12 2019
Aug 29 2019
Differences in bnodes might be tolerated with additional replacement. The cleanup stage could be merged with initial sed+sort
Aug 2 2019
Looking at query exetution plans, ProjectionOp for the query with lang() for coDescription got arranged prior to materialization of coDescription, so it (along with its lang) has not got the way to the projection. The reason for such behavior needs some more research. Will update on that.
Jul 1 2019
Fixed optional support and added testcase for that code path.
Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could be reordered prior to any clauses placed at the bottom of the query, we need to consider only inbound variables, so they would be available for the service call, and all outbound vars available for the latter filters and other clauses.
Jun 25 2019
The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps:
- first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process.
- then EmptyLabelServiceOptimizer adds statement patterns for resolutions.
- and then additional optimizer step rearranges LabelService to the latest possible step before any clauses, which might use the variables bound by LabelService.
May 7 2019
The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet, JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService clause is unavailable.
May 6 2019
Additionally tested configuration option with only Raw records disabled, comparing to original baseline:
Configuration options are assigned in RWStore.properties. Particular options are:
This seems to be optimizers order problem.
CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions:
while running ASTDeferredIVResolution
while running com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer
then while running ConditionalRoutingOp for ChunkedRunningQuery
Apr 29 2019
Complete test logs attached
Load performance for the tested configurations on isolated environment (i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO)
Attached results of the load 100 ttl.gz files with different configurations
- original baseline (commit blazegraph 895a4f3bd003ddb4b1f31257f642ca3616bca79b, rdf 4245b2a5bc0c7d4b369a43ba512b5e537dac07a4)
- reference URIs inlining,
- reference URIs inlining, raw records disabled per T213210
- reference URIs inlining, raw records disabled, INLINE_TEXT_LITERALS for short strings per T213210