Page MenuHomePhabricator

Using DISTINCT with VALUES returns more results than expected
Open, MediumPublicBUG REPORT

Description

The context

Before describing the issue, please let me present some context.

Query 1: This query returns the highest point of Earth and Mars. I executed it and it returned 4 results in 110ms, also, all the returned values were unique. To me, this is expected behavior.

SELECT ?value {
  VALUES ?item {wd:Q2 wd:Q111}.
  ?item wdt:P610 ?value.
}

Query 2: This query returns the distinct values of highest point of the item Earth. I executed it and it returned 3 results in 123ms. To me, this is expected behavior.

SELECT DISTINCT ?value {
  VALUES ?item {wd:Q2}.
  ?item wdt:P610 ?value.
}

Query: 3: This query returns the distinct values of highest point of Earth and Mars. I executed it and it returned 11158 results in 197ms. To me, this is NOT expected behavior.

SELECT DISTINCT ?value {
  VALUES ?item {wd:Q2 wd:Q111}.
  ?item wdt:P610 ?value.
}

Query 4: This query does the same of Query 3, but it is enclosed in a named subquery. I executed it and it returned 4 results in 133ms. To me, this is expected behavior.

SELECT *
WITH {
  SELECT DISTINCT ?value {
    VALUES ?item {wd:Q2 wd:Q111}.
    ?item wdt:P610 ?value.
  }
} AS %0
{
  INCLUDE %0.
}

I think Query 3 has a problem. I've described my reasoning below.

Steps to replicate the issue

  1. Execute the following query (previously called Query 3) in WDQS.
SELECT DISTINCT ?value {
  VALUES ?item {wd:Q2 wd:Q111}.
  ?item wdt:P610 ?value.
}

The query query returned 11158 (as of the time of this writing), even though the distinct values for highest point (P610) of Earth (Q2) and Mars (Q111) are only 4, as shown by Query 1 and Query 4.

What should have happened instead?:

The query should have returned 4 items, since the DISTINCT values for "Earth" (Q2) and "Mars" (Q111) are only 4 (this can be proved by executing Query 1 or Query 4).

Additional information

I think the bug is when using DISTINCT with VALUES.

The following query gets the value of parent taxon (P171) of hippopotamus (Q34505) and tiger (Q19939). I executed it and it returned 2 results in 296ms.

SELECT ?value {
  VALUES ?item {wd:Q34505 wd:Q19939}
  ?item wdt:P171 ?value.
}

I executed the query with DISTINCT (see below) and it timed out.

SELECT DISTINCT ?value {
  VALUES ?item {wd:Q34505 wd:Q19939}
  ?item wdt:P171 ?value.
}

Software version

WDQS as of the time of this writing.

Other information

Browser: Mozilla Firefox 106.0.1

Event Timeline

Oddly enough, the query below uses DISTINCT and VALUES but correctly returns 4 results. So there might also be a connection to wdt:.

SELECT DISTINCT ?value {
  VALUES ?item {wd:Q2 wd:Q111}.
  ?item p:P610 ?v.
  ?v ps:P610 ?value .
}
RKemper renamed this task from Using DISTINCT with VALUES returns more results that expected to Using DISTINCT with VALUES returns more results than expected.Oct 31 2022, 4:47 PM
MPhamWMF moved this task from Incoming to Epics on the Wikidata-Query-Service board.
MPhamWMF moved this task from Epics to Blazegraph on the Wikidata-Query-Service board.