Page MenuHomePhabricator

Beebs.systap (Brad Bebee)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Feb 11 2015, 3:44 AM (490 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Beebs.systap [ Global Accounts ]

Recent Activity

May 5 2015

Beebs.systap added a comment to T90115: BlazeGraph Security Review.

So we can close this ticket, can we confirm / finish the following?

  • @Beebs.systap, is there a special mailing list we need to be on to get notified of security issues? Is someone from Ops subscribed?
May 5 2015, 4:19 AM · Discovery-ARCHIVED, Wikidata, Security-Team, Wikidata-Query-Service

Mar 24 2015

Beebs.systap added a comment to T90115: BlazeGraph Security Review.

Talked with Nik today about running this. We're planning to expose sending raw queries into our cluster.

The biggest threats are a malicious users causes data corruption or resource consumption DoS, or an attacker is able to compromise the Blazegraph server and pivot to the rest of our cluster. The data in Blazegraph is all public (assuming we work out removing deleted/suppressed items), so authorization within Blazegraph isn't a big concern.

Mitigating those threats:

  • We want to make sure we are aware of security patches to Blazegraph, and ops applies those in an appropriate timeframe. @Beebs.systap, is there a special mailing list we need to be on to get notified? I haven't seen any CVE's issued for Blazegraph, so I want to make sure we're watching the right places.
Mar 24 2015, 3:48 AM · Discovery-ARCHIVED, Wikidata, Security-Team, Wikidata-Query-Service
Beebs.systap updated subscribers of T90115: BlazeGraph Security Review.
Mar 24 2015, 3:42 AM · Discovery-ARCHIVED, Wikidata, Security-Team, Wikidata-Query-Service

Mar 10 2015

Beebs.systap added a comment to T92308: Open questions for Blazegraph data model research.

Regarding the executable jar, you can pass the property file with -Dbigdata.propertyFile=<path>

Mar 10 2015, 8:49 PM · Wikidata, Discovery-ARCHIVED, Patch-For-Review, Wikidata-Query-Service

Feb 26 2015

Beebs.systap updated subscribers of T90952: Figure out if we need/can use RDR.
Feb 26 2015, 9:50 PM · Discovery-ARCHIVED, Wikidata-Query-Service

Feb 25 2015

Beebs.systap updated subscribers of T90116: BlazeGraph Finalization: Machine Sizing/Shaping.
Feb 25 2015, 12:24 AM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 23 2015

Beebs.systap updated subscribers of T90101: Confirm selection of BlazeGraph for wikidata query.
Feb 23 2015, 6:33 PM · Discovery-ARCHIVED, Wikidata, Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T90109: BlazeGraph Finalization: Zookeeper.
In T90109#1058981, @Joe wrote:

@Beebs.systap to be more explicit, it's highly probable we won't use ZK as our distributed, consistent KV store of choice internally, so maintaining a separated ZK cluster for blazegraph HA only would be too much of an hassle, hence my desire to co-host it. I also thought that if this raises any concern, we can think of using containers to segregate the two programs and prevent one from interfering with the other.

Feb 23 2015, 3:26 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T90131: BlazeGraph Finalization: Pluggable inline values.
Feb 23 2015, 3:01 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T90121: BlazeGraph Finalization: Prefix vs Suffix.
Feb 23 2015, 3:00 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T90119: BlazeGraph Finalization: RDF Issues.
Feb 23 2015, 3:00 PM · MediaWiki-Core-Team, Wikidata-Query-Service
Beebs.systap updated subscribers of T90117: BlazeGraph Finalization: Scale out plans.
Feb 23 2015, 2:59 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T90123: BlazeGraph Finalization: Value representation.
Feb 23 2015, 2:58 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T90130: BlazeGraph Finalization: Geo.
Feb 23 2015, 2:58 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T88717: Investigate BlazeGraph aka BigData for WDQ.
Feb 23 2015, 2:57 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap updated subscribers of T88717: Investigate BlazeGraph aka BigData for WDQ.
Feb 23 2015, 2:56 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T90109: BlazeGraph Finalization: Zookeeper.
In T90109#1058712, @Joe wrote:

First of all, sorry If I did not get back to you earlier.

I don't like the idea of having a complex tool like zookeeper running just to ensure HA. This is actually pretty bad for me, but still not a blocker per se.

We surely don't want to run Zookeeper in share with analytics, we do have different needs/usage patterns and we surely don't need to cross our work with them.

We also don't want something as important as Zookeeper on VMs, IMO. At least not until our virtualization infrastructure is a bit more tested out (right now we're just starting to build it in codfw).

So I'd say that as long as we plan from the start to have 1 master and N slaves for BlazeGraph with N>1 (a wise choice anyways) we can co-host zookeeper on the same machines, if this doesn't starve system resources in some way. I'll have to investigate that but I guess that's a pretty common usage pattern.

Feb 23 2015, 1:28 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 18 2015

Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

@Haasepeter - Stas and I are in Berlin now and should be able to talk pretty much any time during the day there. Next week I'm pretty free as well, just send me/us an invite. If you don't have Stas' contact info I'll forward it to him as well.

@Beebs.systap - was reviewing code and saw some documentation typos. What is your process for submitting patches?

Feb 18 2015, 1:50 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 14 2015

Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Is there a temporary issue with the LDAP authentication for mediawiki accounts? Bryan created an account, but the login isn't working. I checked and it wasn't working for me either (on a new session).

Feb 14 2015, 1:36 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Can SPARQL take gzipped files? I tried giving gzipped URL on bigdata workbench and it didn't work.

Feb 14 2015, 1:25 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 13 2015

Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Update from SYSTAP/Metaphacts.

Feb 13 2015, 10:53 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

@Beebs.systap, is this still true (from old blog post):
For example, this can happen if your query has a large result set and uses ORDER BY or DISTINCT. Both ORDER BY and DISTINCT force total materialization of the query result set even if you use OFFSET/LIMIT.

It'd be wonderful if the optimizer had the option of walking an index to make materialization not required. Assuming that is actually more efficient. Is there a way to limit the number of results that are materialized before any actual order/limit/offset operation? In our case we'd probably want to just tell folks that their query isn't selective enough to allow order/limit/offset rather than keep working on a very slow query.

Feb 13 2015, 10:46 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Looks like we're going to have trouble with some dates too. xsd:dateTime supports 13798 million years BCE but I think BigData will have trouble with it what with this comment from DateTimeExtension:

/**
 * This implementation of {@link IExtension} implements inlining for literals
 * that represent xsd:dateTime literals.  These literals will be stored as time 
 * in milliseconds since the epoch.  The milliseconds are encoded as an inline 
 * long.
 */

Not that I've had a chance to test it yet. It'd be one of the first things imported during a full import of the statements. I see in the RDF dump on labs its actually a xsd:gYear type though.

Feb 13 2015, 10:26 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

@Beebs.systap, I can't log in to your wiki with Google. It says "OpenID auth request contains an unregistered domain: http://wiki.blazegraph.com/wiki". I imagine that has something to do with the new domain name.

Feb 13 2015, 10:25 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Is there a standard way to load a huge amount of RDF data into Bigdata? I tried the following (with a 3GB gzipped .nt file), but it very quickly blew the heap:

Repository repo = BigdataSailFactory.connect("localhost", 9999);
RepositoryConnection con = repo.getConnection();
File file = new File("/home/james/dumps/wikidata-statements.nt.gz");
FileInputStream fileInputStream = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
con.add(gzipInputStream, null, RDFFormat.N3);

EDIT: I cranked up the heap, and ran into the max array length limitation:

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.util.Arrays.copyOf(Arrays.java:2271)
	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
	at info.aduna.io.IOUtil.transfer(IOUtil.java:494)
	at info.aduna.io.IOUtil.readBytes(IOUtil.java:210)
	at com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.prepareForWire(RemoteRepository.java:1492)
	at com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.access$000(RemoteRepository.java:1436)
	at com.bigdata.rdf.sail.webapp.client.RemoteRepository.add(RemoteRepository.java:890)
	at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:663)
	at com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:648)
	at example.bigdata_client.App.update(App.java:33)
	at example.bigdata_client.App.main(App.java:23)

These errors are all client-side -- the servlet appears to be humming along. Is there a preferred streaming API to use?

Feb 13 2015, 10:07 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

It's also worth checking out the RDF GAS API that Mike references. You can execute graph analytics within the SPARQL queries. It's bundled with BFS, SSP, Connected Components, and Page Rank, but they can also be extended.

Feb 13 2015, 8:14 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Is there a way to to traversals and aggregation in SPARQL? I.e. Cypher examples:

This is list of all professions. Note the '*' there:

MATCH (v:item {wikibaseId: 'Q28640'})<-[:claim|P279|P31*]-(v2:item) RETURN v2.wikibaseId, v2.labelEn;

This is list of countries by latest population data:

MATCH (v:item)-[:claim]->(c:claim:P31 {value: "Q6256"}) 
	MATCH (v)-[:claim]->(c2:claim:P1082) WHERE has(c2.value) 
	WITH v as v, max(c2.P585q) as latest
	MATCH (v)-[:claim]->(cv:claim:P1082)
	WHERE cv.P585q = latest
	RETURN v.wikibaseId, v.labelEn, cv.value, cv.P585q
	ORDER BY cv.value DESC

I wonder how these would look like with SPARQL.

Feb 13 2015, 7:56 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

@Beebs.systap this looks pretty good. How it is done - i.e. what is used to create the triples, how they are imported, etc. - is this code available?

Also, I assume we'd want eventually to support qualifiers/references, i.e. queries like "countries list by population, from largest to smallest" taking into account US has a number of population figures and we'd have to take the latest, or for "female mayors" - we may have to account for the fact that some mayorships could be in the past - i.e. Berlin (https://www.wikidata.org/wiki/Q64) had a lot of mayors, but only one (Michael Müller) is the current mayor, so we'd have to be able to support it. Fortunately, this particular mayor also is marked with "preferred" flag - which we may need to support too - but not all data has preferred flags, so we may need to rely on time qualifiers. Next step would be the same in any point of time (i.e. "female mayors in 20th century").

For references, the query may be "give me all data about Douglas Adams (Q42) that come from Encyclopædia Britannica Online (Q5375741)".

Feb 13 2015, 3:20 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Actually, the github was a first heard for us! That was for another project.

Feb 13 2015, 3:19 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Hey! Does BigData support GeoSpatial queries? I see https://github.com/varunshaji/bigdata-geosparql but I'm not sure how well supported it is.

Feb 13 2015, 3:08 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 12 2015

Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Per Nik's email. Here is the information on the scale out architecture.

Feb 12 2015, 1:04 PM · Wikidata-Query-Service, MediaWiki-Core-Team
Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Here is the wikidata RDF demo that Peter Haase shared.

Feb 12 2015, 1:03 PM · Wikidata-Query-Service, MediaWiki-Core-Team

Feb 11 2015

Beebs.systap added a comment to T88717: Investigate BlazeGraph aka BigData for WDQ.

Thanks to Nik for chatting today. Here's a few of the key items we discussed. The slides are also attached.

Feb 11 2015, 3:55 AM · Wikidata-Query-Service, MediaWiki-Core-Team