Page MenuHomePhabricator

Create a tool that records and compares a set of sparql query results
Closed, ResolvedPublic

Description

In order to evaluate the impact of splitting the wikidata graph we want to compare the outcome of some queries against different endpoint.

For this we need a tool in the same vein of RelevanceForge that can:

  • for a given set of queries record the output of these queries when executed against a particular SPARQL endpoint
  • the ability to compute various metrics by analyzing the difference of the outputs of the same set of queries ran against two different endpoint

Analyzing the difference might require extracting a couple metrics:

  • same results
  • same results but different ordering
  • % of identical lines
  • ...

The input for the tool is a dataset with the following columns: query_provenance, query_id, query_text
The output is a dataset with the following columns: query_provenance, query_id, status_code_left, status_code_right, same, same_unordered, pct_identical_lines

AC:

  • A diff tool is available and can be run on top of a spark dataframe or CSV file
  • It can produce a spark dataframe or a CSV file

Event Timeline

dcausse renamed this task from Create a tool that records and compare a set of sparql results to Create a tool that records and compares a set of sparql query results.Nov 22 2023, 2:00 PM

Change 981550 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Add a tool to execute SPARQL queries and record results

https://gerrit.wikimedia.org/r/981550

Change 981550 merged by jenkins-bot:

[wikidata/query/rdf@master] Add a tool to execute SPARQL queries and record results

https://gerrit.wikimedia.org/r/981550

Change 990717 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for blazegraph default collation key

https://gerrit.wikimedia.org/r/990717

Change 990717 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for blazegraph default collation key

https://gerrit.wikimedia.org/r/990717

Change 991621 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: bump the max reponse size to 16M

https://gerrit.wikimedia.org/r/991621

Change 991622 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for ask, describe and construct

https://gerrit.wikimedia.org/r/991622

Change 991621 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: bump the max response size to 16M

https://gerrit.wikimedia.org/r/991621

Change 991622 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for ask, describe and construct

https://gerrit.wikimedia.org/r/991622

Change 993803 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for blank nodes

https://gerrit.wikimedia.org/r/993803

Change 993803 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for blank nodes

https://gerrit.wikimedia.org/r/993803