Page MenuHomePhabricator

Thompsonbry.systap (Bryan Thompson)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Feb 23 2015, 2:31 PM (488 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Thompsonbry.systap [ Global Accounts ]

Recent Activity

May 14 2015

Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

That is normal. You can choose to explicitly disable bloom filters in
advance. Otherwise they are disabled once their expected error rate would
be too high. Nothing to be concerned about.

May 14 2015, 10:13 AM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service

May 13 2015

Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

We have migrated to JIRA. See http://jira.blazegraph.com/browse/BLZG-1236 for the ticket that corresponds to http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1).

May 13 2015, 8:37 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED

May 4 2015

Thompsonbry.systap added a comment to T97538: Write integration tests that include creation/destruction of namespaces.

Does the integration test demonstrate the problem?

May 4 2015, 3:24 PM · Discovery-ARCHIVED, Wikidata-Query-Service

Apr 30 2015

Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

Yes but. There are global defaults and you can set them. For example, if
you list out e namespace properties you will see how the 128 default is
set. The issue is that we are using different branching factors for the
spo and lex relations, and even inside of that for the different indices.
But the overrides apply to a prefix.

Apr 30 2015, 11:52 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Your branching factors are all set to 128 in this namespace. You should override them per the results I had posted on a different ticket to target 8k pages with few blobs (pages that exceed 8k). Just FYI. This can be done when the namespace is created.

Apr 30 2015, 9:01 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Can you also include the operation that was executed for this error?

Apr 30 2015, 6:06 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Not sure. There is really no reliable way to carry the database forward
from an error like the one that triggered originally. This might be
something we didn't think through in the utility to unblock the journal. Or
it might be bad data in the indices from the original problem.

Apr 30 2015, 6:03 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

See http://trac.bigdata.com/ticket/1228#comment:11 for a utility that should unblock you on writes.

Apr 30 2015, 4:48 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED

Apr 29 2015

Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

See http://trac.bigdata.com/ticket/1228 for some further thoughts on a root cause, some thoughts on how to create a stress test to replicate the problem, and some thoughts on how to fix this.

Apr 29 2015, 9:27 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap updated subscribers of T97468: Blazegraph crash on updater.
Apr 29 2015, 9:06 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

I've pushed the change to Checkpoint.getHeight() to SF git as branch TICKET_1228.

Apr 29 2015, 7:31 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Can you apply the patch to 1.5.1? The changes that I made are
committed against the 1.5.2 development branch. The change to get
past that UnsupportedOperationException thrown from
Checkpoint.getHeight() is just to return the height field. That is:

Apr 29 2015, 7:20 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

One of the questions I had is why would #1021 not have caught this problem.

Apr 29 2015, 5:06 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Our plan is to implement a utility that overwrites the address of the deferred free list. So you would run this utility. It would overwrite the address as 0L so that the deferred free list is empty. You would then re-open the journal normally.

Apr 29 2015, 4:55 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Martyn and I talked this through this morning. He is proceeding per http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). His first step will be to understand the problem in terms of the RWStore internal state so we can generate some theories about how the specific situation was able to manifest. We are also planning to create an RWStore specific utility for extracting information about the allocators, deferred free list, etc. This can then be hooked into the DumpJournal utility to provide more information.

Apr 29 2015, 3:39 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

We have a ticket for this. http://trac.bigdata.com/ticket/1229

Apr 29 2015, 1:44 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Ok, i see the same number of entries in the forward and reverse indices and
across the various statement indices,

Apr 29 2015, 12:12 AM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

It looks like it hit the same error. At least, it has not listed out the
rest of the indices and stops on that solution set stream. We will need to
redo this with the patch to DumpJournal.

Apr 29 2015, 12:00 AM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED

Apr 28 2015

Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

The problem might be an issue with DumpJournal and the SolutionSetStream class. The last thing that it visited was a SolutionSetStream, not a BTree.

Apr 28 2015, 10:10 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

See below for the code that threw the exception in Checkpoint.java

Apr 28 2015, 10:00 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Can I get the full stack trace for that failure? It looks like it does not understand the type of index. I would like to figure out which index has this problem.

Apr 28 2015, 9:59 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

I have created a ticket for this. See http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). Please subscribe for updates.

Apr 28 2015, 9:57 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Stas notes that they had recently deleted a large namespace (one of two) from the workbench GUI. We should check the deferred frees in depth and make sure that the deleted namespace deletes are no longer on the deferred deletes list. Stas also notes that some error message was reported in the GUI for this operation.

Apr 28 2015, 9:25 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

ok. like I said, no known issues.

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Apr 28 2015, 8:58 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Ok. We have much less experience with OpenJDK in production. Normally
people deploy the Oracle JVM. I am not aware of any specific
problems. It is just that I do not have as much confidence around
OpenJDK deployments. But this looks like an internals bug, not a JVM

bug.

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Apr 28 2015, 8:53 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

What JVM (version and vendor) and OS?

Apr 28 2015, 8:50 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

You should be able to see the IOs as it pages through the indices if you are concerned that it might not be making progress. It is probably just busy scanning the disk

Apr 28 2015, 8:38 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

It will take a long time with the -pages option. It is doing a
sequential scan of each index. You will see the output update each
time it finishes with one index and starts on the next. At the end it
will write out a table with statistics about the indices. It should
also write out the allocation information at the start of the dump.

Apr 28 2015, 8:37 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

We can probably force the RWStore to discard the list of deferred deletes. Assuming that the problem is purely in that deferred deletes list, this would let you continue to apply updates.

Apr 28 2015, 8:32 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

That log file is only since the journal restart. Do you have the historical log files?

Apr 28 2015, 8:22 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Was group commit enabled?

Apr 28 2015, 8:20 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

I am only suggesting that you can probably open the database in a
read-only mode and access the data. The "problem" is in the list of
deferred allocation slot addresses for recycling. See the Options
file in the Journal package for how to force a read-only open mode.

Apr 28 2015, 8:08 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

ok. we will need access one way or another.

Apr 28 2015, 8:06 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Let's boil this down to something that can be replicated by a known series of actions with known data sets. The last time we had something like this it was related to a failure to correctly rollback an UPDATE at the REST API layer. So we need to understand whether this is against a clean build, what release, and what properties are in effect.

Apr 28 2015, 8:05 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

The issue is an address in the free list that can not be recycle. You should be able to open in a read only mode based on that stack trace.

Apr 28 2015, 8:01 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T97468: Blazegraph crash on updater.

Noted. Can you replicate the situation leading up to this event? Are you doing anything that goes around the REST API? Can you publish a compressed copy of the journal file that we can download or expose the machine for ssh access?

Apr 28 2015, 7:59 PM · Wikidata, Upstream, Wikidata-Query-Service, Discovery-ARCHIVED

Apr 24 2015

Thompsonbry.systap added a comment to T97095: Add query runtime cap for blazegraph.

You can do this through web.xml. You can also mark the end point as read-only. See http://wiki.blazegraph.com/wiki/index.php/NanoSparqlServer#web.xml and also inline below for some of the more relevant options.

Apr 24 2015, 12:31 AM · Patch-For-Review, Wikidata-Query-Service, Discovery-ARCHIVED

Apr 19 2015

Thompsonbry.systap added a comment to T96490: [Story] Blazegraph support for owl:sameAs and redirects.

One approach is to update the forward dictionary to make the IRIs onto one of the existing IVs and then rewrite all of the statements. However, this sort of destructive rewrite of does not help if you need the "redirects" to be reversible.

Apr 19 2015, 1:08 AM · Story, Wikidata, Wikidata-Query-Service, Discovery-ARCHIVED

Apr 17 2015

Thompsonbry.systap added a comment to T96094: Bad query performance for FILTER NOT EXISTS.

Ok. At that number this will not matter. But it is not being as efficient
as you would hope. Something to work on.

Apr 17 2015, 8:21 PM · Wikidata-Query-Service, Discovery-ARCHIVED
Thompsonbry.systap added a comment to T96094: Bad query performance for FILTER NOT EXISTS.

This part could be expensive. Instead of having a prefix scan, it is
scanning all statements, materializing the Object position from the
dictionary, and then checking the string representation of that URI for a
match.

Apr 17 2015, 8:05 PM · Wikidata-Query-Service, Discovery-ARCHIVED

Apr 16 2015

Thompsonbry.systap added a comment to T96100: Last update check query is too slow.

Ah. That is good. Copying Michael.

Apr 16 2015, 3:51 PM · Discovery-ARCHIVED, Wikidata-Query-Service

Apr 14 2015

Thompsonbry.systap added a comment to T96100: Last update check query is too slow.

There are two existing tickets for Blazegraph that might be related to this
ticket on your tracker and the other ticket that you filed today. They are:

Apr 14 2015, 11:47 PM · Discovery-ARCHIVED, Wikidata-Query-Service

Mar 31 2015

Thompsonbry.systap added a comment to T94539: BlazeGraph uses old xsd:dateTime standard.

Peter comments that he has also run into this just recently.

Mar 31 2015, 10:44 AM · Patch-For-Review, Wikidata, Discovery-ARCHIVED, Wikidata-Query-Service

Mar 24 2015

Thompsonbry.systap added a comment to T90115: BlazeGraph Security Review.

Another possibility is using CAS counters (striped atomic counters) to track resources associated with a query and use that to bound memory for queries running on the java managed heap (in addition to bounding the memory associated with the native heap, input queue capacity, etc.).

Mar 24 2015, 7:35 PM · Wikidata, Discovery-ARCHIVED, Security-Team, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90115: BlazeGraph Security Review.

The analytic query mode does offer some ability to bound memory but it is not 100% across all queries. For example, quads mode queries do not currently put the distinct triple pattern filter on the native heap. ORDER BY is not currently on the native heap, property paths are not currently on the native heap (though they now support incremental eviction and are executed much more efficiently). Also the native DISTINCT solutions operator can not be used with queries that must preserve order (e.g., where LIMIT or ORDER BY are in use). Finally, we do not buffer the input queues for the operators on the native heap in the analytic query mode. These things all limit our ability to strictly bound native memory usage.

Mar 24 2015, 1:20 PM · Wikidata, Discovery-ARCHIVED, Security-Team, Wikidata-Query-Service

Mar 11 2015

Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

This may not be the right ticket, but I did some experimentation with the data sets that I referenced above looking at parameterization of the load. Using an Intel 2011 Mac Mini with 16GB of RAM and an SSD I have a total throughput across all datasets of 6 hours, which is basically 20k triples per second (tps) over 429M triples loaded. The best parameters are below. This configuration used slightly more space on the disk (66G vs 60G). It uses a much smaller branching factor for the OSP index and the small slot optimization on the RWStore to attempt to co-locate the scattered OSP index updates (the updates for this index are always scattered because the inserts are always clustered on the source vertex - this is just how it works out for every application I have seen.)

Mar 11 2015, 12:23 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service

Mar 10 2015

Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

Great! Please put my name on ticket so I will see it.

Mar 10 2015, 10:59 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service
Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.
  1. I am not sure. That's why I would like to see it in a test case.
Mar 10 2015, 9:11 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service
Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

Ok. That query

Mar 10 2015, 9:02 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service
Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

What is the correct prefix declaration for "wdt"?

Mar 10 2015, 8:51 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service
Thompsonbry.systap added a comment to T92308: Open questions for Blazegraph data model research.

It might be useful to take some of these questions to the bigdata-developers mailing list. Some of these questions already have answers on the wiki.

Mar 10 2015, 8:51 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service

Mar 3 2015

Thompsonbry.systap added a comment to T90128: BlazeGraph Finalization: Validate AST rewrite.

Ok. That makes sense.

Mar 3 2015, 8:28 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90128: BlazeGraph Finalization: Validate AST rewrite.

Do you have a pointer to your code? I'd like to understand where the pain is.

Mar 3 2015, 8:07 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Mar 2 2015

Thompsonbry.systap added a comment to T90445: BlazeGraph Finalization: Plan for no downtime upgrade.

HA uses versioned RMI messages to avoid potential conflicts in rolling upgrades. To upgrade, simply shutdown a given HAJournalServer process. Redeploy the code. Restart the HAJournalServer. It will automatically resync and go live.

Mar 2 2015, 6:37 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 27 2015

Thompsonbry.systap added a comment to T90952: Figure out if we need/can use RDR.

That is a good point. This can (and should) be fixed.

Feb 27 2015, 3:40 PM · Discovery-ARCHIVED, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90952: Figure out if we need/can use RDR.

I think that it is important for the data to have a certain "readiness to hand" to really promote reuse. I suggest that you try to model the data using a few different reification strategies and see what the queries look like and how the data can be made suitably accessible. For example, by storing the best rank version of the data as ground triples and then qualifying those triples with statements about statements for agents that are a little smarter. Agents that want to fully pierce the veil would need to be more "aware" of the data model (perhaps consulting predicates in a different namespace or consulting a different data set) and could then dynamically produce a version of "truth" that suits their needs through query time filtering.

Feb 27 2015, 12:28 AM · Discovery-ARCHIVED, Wikidata-Query-Service

Feb 26 2015

Thompsonbry.systap added a comment to T90952: Figure out if we need/can use RDR.

The performance gain from the indexing strategy is roughly 4x when performing a join that would otherwise require the use of a reified statement model join. This is because we actually eliminate the 4 joins required to match the reified statement model, replacing them with a single join matching the inlined version of the statement about the statement.

Feb 26 2015, 11:02 PM · Discovery-ARCHIVED, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90952: Figure out if we need/can use RDR.

You do not need to use the RDR interchange syntax or the RDR syntax for SPARQL query to take advantage of the RDR support inside of BlazeGraph. All you need to do is serialize your data as RDF and use RDF reification to interchange statements modeling metadata about other statements. Any database that is intelligent about how it handles reification can then index your data efficiently - and they can using the nesting trick that we use in BlazeGraph, a different approach based on column-wise storage that we use in MapGraph, or some entirely different physical schema. In fact, you should not care how the database is indexing the data - that is the whole point of having a physical/logical abstraction and a declarative query language. The database gets to index the data however you choose and you rely on it to process your queries efficiently. BlazeGraph delivers exactly this for link attributes and more generally for nested statements about statements. However, the "semantics" are precisely those of RDF. It is just that the indexing is more efficient. And that we provide nicer interchange syntax and query syntax for accessing link attributes and statements about statements.

Feb 26 2015, 10:08 PM · Discovery-ARCHIVED, Wikidata-Query-Service

Feb 25 2015

Thompsonbry.systap added a comment to T90119: BlazeGraph Finalization: RDF Issues.

There is support for inline UUIDs for blank nodes. See UUIDBNodeIV. You
could also define a fully inline URI with a well-known prefix and a UUID.
Bryan

Feb 25 2015, 2:31 PM · MediaWiki-Core-Team, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90119: BlazeGraph Finalization: RDF Issues.

You can use URIs instead of blank nodes. Most of the time when people use
blank nodes they SHOULD be using URIs. Blank nodes are existential
variables. Coin URIs if you want to have a reference.

Feb 25 2015, 2:10 PM · MediaWiki-Core-Team, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90119: BlazeGraph Finalization: RDF Issues.

The RDR inlining of reified statement models is handled by the StatementBuffer class. It is important to have a limited lexical scope in the dump for the different RDF triples involved in the reified statement model. The code needs to buffer incomplete statement models until they become complete statement models, at which point it can release the storage associated with the partial model and write it out. Also, if your output includes a lot of blank nodes, it is a Good Idea to have limited resolution scope for blank nodes since the parser must maintain them across the entire document. Thus, outputting an RDF dump as a series of files can reduce the parser overhead.

Feb 25 2015, 1:58 PM · MediaWiki-Core-Team, Wikidata-Query-Service
Thompsonbry.systap added a comment to T90116: BlazeGraph Finalization: Machine Sizing/Shaping.

We've assigned the property path optimization and will focus on it in our next sprint. In fact, we hope to get started on this late this week.

Feb 25 2015, 1:41 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90116: BlazeGraph Finalization: Machine Sizing/Shaping.

In terms of the machine shape, the general guidelines you give are appropriate. However, here is out it plays out in terms of GC. Large heaps => long GC pauses. So you want to keep the JVM heap fairly small (4G => 8G). Analytic queries can use the native C process heap for hash index joins and (in the future) for storing intermediate solutions. So the actual C process heap (for the JVM) can be bigger. If you are bulk loading data then you want more write cache buffers. Those are 1MB buffers. You can have 6 => 1000s. This also helps for bulk load onto disks that can not reorder writes (SATA).

Feb 25 2015, 1:29 AM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90116: BlazeGraph Finalization: Machine Sizing/Shaping.

Concerning thread.

Feb 25 2015, 1:17 AM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90116: BlazeGraph Finalization: Machine Sizing/Shaping.

By label lookup I assume that you mean materializing (projecting out through a SELECT expression) the actual RDF Values for URIs or Literals that have become bound by the query. In general, join processing can proceed without dictionary materialization and that materialization step is deferred until the variable bindings are projected out. At that point they require scattered IOs against the reverse dictionary index (ID2TERM). This does incur overhead.

Feb 25 2015, 1:09 AM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90116: BlazeGraph Finalization: Machine Sizing/Shaping.

There are two known issues with the property path operator that Nik and I discussed today. These are [1] and [2]. The first of these issues means that the operator is actually fully materializing the property paths *before* giving you any results. [1] is a ticket to change that an do incremental eviction of the results. That will fix the "liveness" issue you are seeing with that path expression. The other ticket deals with another problem where the property path operator can become CPU bound if the necessary access paths are in cache since there is no IO Wait. However, I suspect that [2] will disappear when we address [1]. As for the timing on this, we can elevate this to a critical issue and get it done in our next sprint for the 1.5.1 release.

Feb 25 2015, 1:05 AM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 24 2015

Thompsonbry.systap added a comment to T90128: BlazeGraph Finalization: Validate AST rewrite.

No. But this is very easy to add. I have created a ticket for this [1]. You can register with trac and subscribe to that ticket.

Feb 24 2015, 2:23 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 23 2015

Thompsonbry.systap added a comment to T90130: BlazeGraph Finalization: Geo.

I have no reason to believe that this would be a problem. The code to translate that into MGRS information could be pushed down into the server when the data are being written onto the text index so it sees a geo:wktLiteral and turns it into the appropriate MGRS coding. Also, the full text analyzers are already pluggable. Something could be done to make this pluggable as well.

Feb 23 2015, 7:59 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90117: BlazeGraph Finalization: Scale out plans.

BlazeGraph supports arbitrary nesting of statements on statements, so, yes,
that would be fine.

Feb 23 2015, 5:44 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90130: BlazeGraph Finalization: Geo.

It is very easy to write and register custom functions for things like
distance filtering. See [1]. This could be combined with the MGRS approach
and prefix scans on the text index to provide a fairly efficient spatial
distance capability.

Feb 23 2015, 5:43 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90114: BlazeGraph Finalization: Update performance.

The data storage on the disk is identical for the single machine and HA replication cluster (HAJournalServer) modes. In fact, you can take a compressed snapshot file from the HA replication cluster, decompress it, and open it as a standard Journal.

Feb 23 2015, 4:08 PM · Discovery-ARCHIVED, Wikidata, Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90130: BlazeGraph Finalization: Geo.

We would be interested in both the elastic search and GeoSPARQL integrations.

Feb 23 2015, 3:33 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90131: BlazeGraph Finalization: Pluggable inline values.

I agree that this is related to how you choose to represent, index, and
query values with additional annotations (error bounds, uncertainty,
different values at different points in time, etc.). This is not a simple
issue. One idea that I have seen is that the preferred values could be
indexed as ground statements (much as in the original data set that Peter
loaded). Those could be searched quite efficiently. But this becomes
problematic I think if you have multiple preferred values unless you then
hit the database again to pull out the metadata about those values.

Feb 23 2015, 3:20 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

I've received the signed CLA from corporate. Once I get Nik's SF account I will set him up as a developer.

Feb 23 2015, 3:17 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90117: BlazeGraph Finalization: Scale out plans.

This depends on how you model the reified RDF data. However, the inlined statements about statements are not in the same part of the statement indices as the ground statements. This is because the IVs all have a prefix byte that includes whether the IV is a Literal, URI, Blank Node or Statement (inlined statements about statements support). So the statement indices are partitioned on each component of the key in terms of whether that key component is a Literal, URI, Statement, etc.

Feb 23 2015, 3:14 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Thompsonbry.systap added a comment to T90131: BlazeGraph Finalization: Pluggable inline values.

Inline values are not necessary. They represent a tradeoff between dictionary encoding values and their direct representation as inline values within the statement indices. There is a simple example ColorsEnumExtension in com.bigdata.rdf.internal that illustrates how to do this for enumerated values. Similar approaches can be used for other things. I would also suggest looking at the DateTimeExtension class.

Feb 23 2015, 3:05 PM · Wikidata-Query-Service, MediaWiki-Core-Team