I don't think we will be able to complete a security review of BlazeGraph before we settle on it but we should think about it. The relevant points are these:
- We'd like to expose SPRAQL to users. Its very powerful. It might be *too* powerful because it supports:
- Federated queries. I imagine we'd have to turn that off or else it'd allow user to make sparql like http requests. Maybe one day we actually have some federation to do and its controlled with a whitelist or something.
- ORDER BY. We _need_ order by but sometimes its implemented by materializing all the results in memory and then sorting them. We would have to prevent pure order by queries or use SPARQL AST rewriting (a Java plugin) to rewrite ORDER BY queries into a safer form. In SQL this changes:
SELECT * FROM foo NATURAL JOIN bar NATURAL JOIN baz ORDER BY bar.name, baz.name
into
SELECT * FROM (SELECT * FROM foo NATURAL JOIN bar NATURAL JOIN baz LIMIT 10000) ORDER BY bar.name, baz.name
- In general queries are killable and timeoutable (yay) but if they are able to consume the entire JVM heap then this is no longer the case. BlazeGraph handles this with "analytic" queries which aren't run on the heap but on a native heap. This make controlling their memory usage simpler. We really need to play with this. Both from a performance and stability perspective. It deserves both intentional trying to break it and randomized attack like you could generate with randomized testing ala lucene.