Page MenuHomePhabricator

Painters Sparql query to crash the Wikidata Query Service Beta
Closed, ResolvedPublic

Description

In WDQ I have this query: http://tools.wmflabs.org/autolist/autolist1.html?start=300&props=170&q=CLAIM%5B31%3A3305213%5D%20AND%20CLAIM%5B170%5D%20AND%20NOCLAIM%5B170%3A%28CLAIM%5B106%3A1028181%5D%20%29%5D%20AND%20NOCLAIM%5B170%3A4233718%5D%20AND%20NOCLAIM%5B170%3A4294967294%5D

I converted that to SPARQL using http://tools.wmflabs.org/wdq2sparql/w2s.php . This is what I got:

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix entity: <http://www.wikidata.org/entity/>
SELECT ?item WHERE {
  ?item wdt:P31 entity:Q3305213 .
  ?item wdt:P170 ?dummy0 .
  FILTER NOT EXISTS { ?item wdt:P170 ?sub1 }
{
    ?sub1 wdt:P650 ?dummy2 .
} UNION {
    ?sub1 wdt:P245 ?dummy3 .
}
  FILTER NOT EXISTS { ?item wdt:P170 entity:Q4233718 }
  FILTER NOT EXISTS { ?item wdt:P170 entity:Q4294967294 }
} LIMIT 10

If I run this I get a "ERROR: 504 Gateway Time-out 504 Gateway Time-out nginx/1.4.6 (Ubuntu) "

Related Objects

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill added subscribers: Multichill, JanZerebecki.

First thing I recognized is that Q4294967294 is not an item. I guess that's not where this error comes from though.

Looks like this query gets OOM error in the Blazegraph. We don't have yet memory protections in place there (coming soon) but looks like this query requires more resources than the beta machine has. entity:Q4294967294 is definitely not right, I imagine it's some kind of WDQ hack, maybe need to add it to translator.

no value is represented as item 4294967295
unknown value is represented as item 4294967294

Original WDQ query:

CLAIM[31:3305213] AND CLAIM[170] AND NOCLAIM[170:(CLAIM[106:1028181])] AND NOCLAIM[170:4233718] AND NOCLAIM[170:4294967294]

The resulting query looks weird as it mentions P650 and doesn't mention 106 at all. This is what I get:

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix entity: <http://www.wikidata.org/entity/>
SELECT ?item WHERE {
  ?item wdt:P31 entity:Q3305213 .
  ?item wdt:P170 ?dummy0 .
  FILTER NOT EXISTS { ?item wdt:P170 ?sub1 }
  ?sub1 wdt:P106 entity:Q1028181 .
  FILTER NOT EXISTS { ?item wdt:P170 entity:Q4233718 }
  FILTER NOT EXISTS { ?item wdt:P170 entity:Q4294967294 }
}

I this also the original query says: produce paintings which have creator and also don't have any creators who are painters and also don't have creators who are anonymous. I'm not sure if that was the original intent or maybe I misunderstand, but I'd say the query that says "produce paintings which have creators who are not anonymous and are not listed as painters" makes more sense. Such query may look like this:

prefix ps: <http://www.wikidata.org/prop/statement/>
prefix p: <http://www.wikidata.org/prop/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix entity: <http://www.wikidata.org/entity/>
SELECT ?item WHERE {
  ?item wdt:P31 entity:Q3305213 .
  ?item wdt:P170 ?creator .
  FILTER NOT EXISTS {
    ?creator p:P106/ps:P106 entity:Q1028181 .
  }
  FILTER ( ?creator != entity:Q4233718 )
  FILTER ( !isBlank(?creator) )
} LIMIT 10

and seems to run fast. The difference is that the former form has a cartesian product of two sets of creators (since we're not stating we're talking about the same creator in each case) and the latter doesn't.

I understand that this is not a direct translation from WDQ, and I'll also add handling of Q4294967294/5 to wdq2sparql, but looks like for some queries auto-translation may not be the optimal route.

Note also in this particular case wdt: may not work as well since for Michelangelo for example only one occupation (artist) is listed as preferred. Using wdt: is faster but we need to understand if this is really what we want (i.e. what "preferred" means for Michelangelo and if it needs to be there and if so how we write the queries). I know it's not as easy as I'd like so thoughts welcome on making it easier.

With fixes to W2S, the query looks like:

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix entity: <http://www.wikidata.org/entity/>
SELECT ?item WHERE {
  ?item wdt:P31 entity:Q3305213 .
  ?item wdt:P170 ?dummy0 .
  FILTER NOT EXISTS {
    ?item wdt:P170 ?sub1 .
    ?sub1 wdt:P106 entity:Q1028181 .
  }
  FILTER NOT EXISTS { ?item wdt:P170 entity:Q4233718 }
  FILTER NOT EXISTS {
    ?item wdt:P170 ?unk2 .
    FILTER (isBlank(?unk2))
  }
}

and runs in 2 s. I think it's resolved for now, if you disagree please reopen or create new issue. I'll keep an eye out for OOM situations and their prevention.