Create a tool that records and compares a set of sparql query results
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• dcausse
	Nov 22 2023, 2:00 PM

Description

In order to evaluate the impact of splitting the wikidata graph we want to compare the outcome of some queries against different endpoint.

For this we need a tool in the same vein of RelevanceForge that can:

for a given set of queries record the output of these queries when executed against a particular SPARQL endpoint
the ability to compute various metrics by analyzing the difference of the outputs of the same set of queries ran against two different endpoint

Analyzing the difference might require extracting a couple metrics:

same results
same results but different ordering
% of identical lines
...

The input for the tool is a dataset with the following columns: query_provenance, query_id, query_text
The output is a dataset with the following columns: query_provenance, query_id, status_code_left, status_code_right, same, same_unordered, pct_identical_lines

AC:

A diff tool is available and can be run on top of a spark dataframe or CSV file
It can produce a spark dataframe or a CSV file

Details

Subject	Repo	Branch	Lines +/-
QueryResultRecorder: add support for blank nodes	wikidata/query/rdf	master	+27 -9
QueryResultRecorder: add support for ask, describe and construct	wikidata/query/rdf	master	+229 -19
QueryResultRecorder: bump the max response size to 16M	wikidata/query/rdf	master	+1 -1
QueryResultRecorder: add support for blazegraph default collation key	wikidata/query/rdf	master	+44 -3
Add a tool to execute SPARQL queries and record results	wikidata/query/rdf	master	+734 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T335067 Epic: Wikidata Query Service stabilization
Open	None	T337013 [Epic] Splitting the graph in WDQS
Resolved	Gehel	T352538 [EPIC] Evaluate the impact of the graph split
Resolved	• dcausse	T351819 Create a tool that records and compares a set of sparql query results

Event Timeline

• dcausse created this task.Nov 22 2023, 2:00 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 22 2023, 2:00 PM

• dcausse renamed this task from Create a tool that records and compare a set of sparql results to Create a tool that records and compares a set of sparql query results.Nov 22 2023, 2:00 PM

• dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

Maintenance_bot added a project: Wikidata.Nov 22 2023, 2:29 PM

• dcausse mentioned this in T349519: Determine if IGUANA and TFT would fit our query analysis needs.Nov 23 2023, 2:47 PM

• dcausse claimed this task.Nov 28 2023, 9:16 AM

• dcausse added a project: Discovery-Search (Current work).

• dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

Gehel moved this task from Incoming to Current work on the Wikidata-Query-Service board.Nov 28 2023, 1:48 PM

dr0ptp4kt subscribed.Nov 30 2023, 10:14 PM

Gehel edited parent tasks, added: T352538: [EPIC] Evaluate the impact of the graph split; removed: T337013: [Epic] Splitting the graph in WDQS.Dec 1 2023, 2:49 PM

Change 981550 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Add a tool to execute SPARQL queries and record results

https://gerrit.wikimedia.org/r/981550

gerritbot added a project: Patch-For-Review.Dec 8 2023, 3:14 PM

Ladsgroup subscribed.Dec 11 2023, 12:04 PM

Change 981550 merged by jenkins-bot:

[wikidata/query/rdf@master] Add a tool to execute SPARQL queries and record results

https://gerrit.wikimedia.org/r/981550

Maintenance_bot removed a project: Patch-For-Review.Jan 11 2024, 10:30 PM

• dcausse mentioned this in T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.Jan 15 2024, 10:08 AM

Change 990717 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for blazegraph default collation key

https://gerrit.wikimedia.org/r/990717

gerritbot added a project: Patch-For-Review.Jan 15 2024, 3:53 PM

Change 990717 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for blazegraph default collation key

https://gerrit.wikimedia.org/r/990717

Maintenance_bot removed a project: Patch-For-Review.Jan 16 2024, 5:30 PM

Change 991621 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: bump the max reponse size to 16M

https://gerrit.wikimedia.org/r/991621

Change 991622 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for ask, describe and construct

https://gerrit.wikimedia.org/r/991622

Change 991621 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: bump the max response size to 16M

https://gerrit.wikimedia.org/r/991621

Change 991622 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for ask, describe and construct

https://gerrit.wikimedia.org/r/991622

Maintenance_bot removed a project: Patch-For-Review.Jan 22 2024, 4:31 PM

Change 993803 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] QueryResultRecorder: add support for blank nodes

https://gerrit.wikimedia.org/r/993803

gerritbot added a project: Patch-For-Review.Jan 29 2024, 9:41 PM

Change 993803 merged by jenkins-bot:

[wikidata/query/rdf@master] QueryResultRecorder: add support for blank nodes

https://gerrit.wikimedia.org/r/993803

Maintenance_bot removed a project: Patch-For-Review.Jan 30 2024, 12:31 PM

• dcausse moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board.Feb 2 2024, 1:42 PM

Gehel closed this task as Resolved.Feb 2 2024, 2:29 PM

Create a tool that records and compares a set of sparql query resultsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Create a tool that records and compares a set of sparql query results
Closed, ResolvedPublic
Actions

Related Objects
Search...