See 2 related tasks:
We need the capability to automatically compare the results returned by running a query on different triplestores. Particularly, we need to stress test query rewrites.
Input can be
- any successful Blazegraph query that failed during traffic replay
- examples on query UI and on wiki documentation
- bug reports.
- ...
We need the capability to:
- execute large batches of queries as well as spot analysis.
- summarize and report the output of diffs (how many queries present different results? which ones?). Reports should not contain PII and avaialble in public, outside of superset (SPIKE: explore implementing a bespoke app on toolforge).
- we should be able to schedule validation run on Airflow, as well as allow end users to run the tool locally.
- triage if the issue is with the query, the index serialization, a bug in our data ingestion, a bug in the triplestore.
Needs to handle:
- Blank nodes
- Ordering
- Inference differences?
- Datatype normalization
Explore:
- how to automate regression tests in the data preparation step that splits and normalizes the entity dump into main and scholarly datasets.
- what makes for a good set of control queries to support regression tests.
- any learning we can implement as Data Quality step in indexing pipeline.
Tooling
- Jena
- strong preference for map/reduce-style parallelization
- ...