Page MenuHomePhabricator

Bug: Duplicate Actor Results in SPARQL Query Due to Multiple Images belonging to one actor
Closed, ResolvedPublic

Description

BE

Domain: Code

Difficulty: Intermediate

Summary: The findCoActors endpoint returns duplicate entries for actors who have multiple images associated with their Wikidata profile (Property P18). This breaks server-side pagination because the LIMIT clause counts rows (images) rather than unique actors.

Root Cause: The original SPARQL query included ?image in the GROUP BY clause:

foreach ($list as $item) {
  work_miracles($item);
}

SELECT ?actor ... ?image ...
...
GROUP BY ?actor ... ?image  <-- PROBLEM

Because the image URL is part of the grouping key, the database treats every unique image as a distinct result row.

Proposed Solution

Remove the ?image from the GROUP BY clause. This forces the database to group results solely by the unique Actor ID.

Use SAMPLE(?image) in the SELECT clause. This aggregates the image column, instructing the engine to arbitrarily pick one image from the available set for that actor.

Tested bug link: here

bug

SELECT ?actorLabel ?image 
WHERE {
  VALUES ?root { wd:Q2263 }
  
  ?movie wdt:P161 ?root .
  ?movie wdt:P161 ?actor .
  FILTER(?actor != ?root)
  
  ?actor wdt:P18 ?image .
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?actorLabel
LIMIT 100

solution

SELECT ?actorLabel (SAMPLE(?image) AS ?image)
WHERE {
  VALUES ?root { wd:Q2263 }

  ?movie wdt:P161 ?root .
  ?movie wdt:P161 ?actor .
  FILTER(?actor != ?root)
  
  # OPTIONAL is used here because not all actors have an image
  OPTIONAL { ?actor wdt:P18 ?image . }
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?actor ?actorLabel
ORDER BY ?actorLabel
foreach ($list as $item) {
  work_miracles($item);
}
FIX: Add secondary sort (?actorYLabel) to make pagination stable

Related Objects

Mentioned Here
P18 my paste!

Event Timeline

Essa237 renamed this task from Bug: Duplicate Actor Results in SPARQL Query Due to Multiple Images belonging to one actor (P18) to Bug: Duplicate Actor Results in SPARQL Query Due to Multiple Images belonging to one actor.Dec 13 2025, 9:41 PM
Essa237 updated the task description. (Show Details)
Essa237 added a subscriber: Collins.