Page MenuHomePhabricator

Determine if IGUANA and TFT would fit our query analysis needs
Closed, ResolvedPublic8 Estimated Story Points

Description

TFT (compliance test) and IGUANA (perf testing) are two frameworks that were evaluated by Andrea (see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing).

We should evaluate if these two frameworks are suitable for our offline query evaluation.

At a glance we would need:

  • [TFT] a way to record the expected outcome of a query
  • [TFT] run a compliance test (evaluate the time it might require)
  • [IGUANA] run perf test on a set of queries

AC:

  • Determine if TFT & IGUANA are adapted to our needs
  • Estimate possible adaptations to make to these frameworks suitable to our usecases

Event Timeline

To clarify ... TFT needs to be updated to test the current RDF, RDF*, SPARQL definitions. The current definitions that were migrated are out of date.

IGUANA needs the tests defined that reflect the characteristics that were specified.

Without these changes, the infrastructures are useable - but not useful.

So, it is likely that the estimate of "8" points is low.

@AWesterinen thanks for the heads-up, the scope of this ticket is to determine if these tools can be useful in the context of an offline analysis of the wikidata vs scholarly article split (adapting TFT to newer RDF standards is definitely out of scope of this ticket).
It is very possible that these tools are too specific for this need and we might have to write our own but we thought that this graph split project could be a good opportunity for us to learn these tools.

@dcausse I have the work item to update TFT to the latest tests on my to-do list. I can try to prioritize it higher, if that is valuable to you.

It is probably 3-4 days of work.

Completing IGUANA would require more work to get basic functionality. Maybe 2-3 weeks? (top of head guess)

TFT
does not seem appropriate for the kind of tests we have to make within the scope of the graph split project, it does not provide anything to ease the creation of test sets, I created T351819 to create such tooling.
IGUANA
does seem more promising, after a couple fixes I was able to run a simple test against our public endpoint, User:AndreaWest/WDQS_Testing/Running_Iguana#How_to_Change_or_Extend_the_Tests_and_Infrastructure gives a good overview of the extracted metrics. The test output in its current format (RDF) might not be directly actionable by a data-analyst and might need some transformation to a kind of tabular data.
I used IGUANA version 3.3.3 and had to apply several fixes (I forked in https://gitlab.wikimedia.org/repos/search-platform/IGUANA), we should consider using the new metrics added by Andrea in QPSMetric (meanQueryTime, geometricMeanQueryTime, min, max and their penalized version).
I attempted to upstream my fixes but it appears that the IGUANA maintainers no longer want to maintain the 3.x branch because they're working on a v4 version which is a major refactor, I did not have time to look too deeply into it...
For the wdqs graph split needs it might OK to use our v3.3.3 fork but for blazegraph alternatives evaluation we might consider rebasing Andrea's work on the upcoming v4 branch depending on its status.