Page MenuHomePhabricator

Compare the performance of sparql queries between the full graph and the subgraphs
Closed, ResolvedPublic8 Estimated Story Points

Assigned To
Authored By
dcausse
Jan 15 2024, 8:45 AM
Referenced Files
F41740308: Screenshot 2024-02-01 at 10.17.28 AM.png
Feb 1 2024, 5:03 PM
F41722573: Screenshot 2024-01-27 at 6.38.45 AM.png
Jan 27 2024, 12:54 PM
F41722569: Screenshot 2024-01-27 at 6.36.58 AM.png
Jan 27 2024, 12:54 PM
Restricted File
Jan 26 2024, 7:36 PM
F41720006: Screenshot 2024-01-26 at 10.20.48 AM.png
Jan 26 2024, 4:25 PM
F41720004: Screenshot 2024-01-26 at 10.19.36 AM.png
Jan 26 2024, 4:25 PM
F41718197: Screenshot 2024-01-25 at 7.43.14 PM.png
Jan 26 2024, 1:58 AM

Description

By using IGUANA we should compare how the queries perform when running against the full graph vs the wikidata main graph and scholarly subgraphs.

Metrics to collect are based upon IGUANA's standard metrics as defined at https://iguana-benchmark.eu/docs/3.3/usage/metrics/ .

  • Query mixes per hour (expresses total queries contained within query set, considering for example complex nested queries)
  • Queries per second
  • Number of queries per hour
  • Average queries per second
  • Number of queries

Queries that succeeded against the "full" and "main" (non-scholarly) graphs on wdqs1022 and wdqs1023 will be used. There are two passes for these queries:

  1. Sampled from bots
  2. Sampled from full traffic.

AC:

  • An analysis report is available

Event Timeline

dcausse renamed this task from Com to Compare the performance of sparql queries between the full graph and the subgraphs.Jan 15 2024, 8:49 AM
dcausse updated the task description. (Show Details)
Gehel set the point value for this task to 8.Jan 15 2024, 4:31 PM

For the first pass, the following configuration is being used for an hour long test conducted from stat1006 with config file wdqs-split-test.yml as follows.

datasets:
  - name: "split"
connections:
  - name: "baseline"
    endpoint: "https://wdqs1022.eqiad.wmnet/sparql"
  - name: "wikidata_main_graph"
    endpoint: "https://wdqs1024.eqiad.wmnet/sparql"

tasks:
  - className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
    configuration:
      timeLimit: 3600000
      warmup:
        timeLimit: 30000
        workers:
          - threads: 4
            className: "SPARQLWorker"
            queriesFile: "queries_for_performance.txt"
            timeOut: 5000
      queryHandler:
        className: "DelimInstancesQueryHandler"
        configuration:
          delim: "### BENCH DELIMITER ###"
      workers:
        - threads: 4
          className: "SPARQLWorker"
          queriesFile: "queries_for_performance.txt"
          timeOut: 60000
          parameterName: "query"
          gaussianLatency: 100

metrics:
  - className: "QMPH"
  - className: "QPS"
  - className: "NoQPH"
  - className: "AvgQPS"
  - className: "NoQ"

storages:
  - className: "NTFileStorage"
    configuration:
      fileName: result.nt

queries_for_performance.txt is based on the following basic code, which says to get queries known to work against both the full graph and the main (non-scholarly) graph and returning similar results, so as to reduce garbage input and somewhat better control the parameters of the test.

scala> val wikidata = spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/wikidata_classified.parquet")
scala> val full = spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/full_classified.parquet")
scala> val joined5 = wikidata.as("w").join(full.as("f")).where("w.id = f.id and w.success = true and  w.success = f.success and w.resultSize = f.resultSize and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"), lit("\n### BENCH DELIMITER ###"))).distinct
scala> joined5.repartition(1).write.option("compression", "none").text("queries_for_performance_2024_01_25.txt")
scala> :quit

$ hdfs dfs -copyToLocal hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_2024_01_25.txt/part-00000-6b8caed3-3a4d-4cb2-bf74-6bbcd7af0478-c000.txt ./queries_for_performance.txt
$ /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -jar iguana-3.3.3.jar wdqs-split-test.yml

The IGUANA build is based on https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/4 .

Dropping in a screenshot from Grafana from this first pass and made a copy of result.nt to result.nt.001. Re-running to see that server behavior is similar.

Screenshot 2024-01-25 at 7.43.14 PM.png (1×3 px, 1013 KB)

Now, a screenshot showing the re-run. And then a screenshot showing them side-by-side. This is just for the visual, and the data produced from IGUANA (what is in the .nt output that we can convert to a handy CSV) should be more telling.

Next up, I'll randomize the order of the queries and do it again.

Screenshot 2024-01-26 at 10.19.36 AM.png (2×3 px, 1004 KB)

Screenshot 2024-01-26 at 10.20.48 AM.png (2×3 px, 1 MB)

Now, the screenshot from the randomized order queries. I'll run one more time to see that comparable output is achieved. Those were produced with the following. This latest output file has been moved to result.nt.003.

scala> val joined6 = wikidata.as("w").join(full.as("f")).where("w.id = f.id and w.success = true and  w.success = f.success and w.resultSize = f.resultSize and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"), lit("\n### BENCH DELIMITER ###"))).distinct.sample(withReplacement=false, fraction=1.0, seed=42)
scala> joined6.count // matches same as joined5.count
scala> joined6.repartition(1).write.option("compression", "none").text("queries_for_performance_randomized_2024_01_26.txt")
scala> :quit
$ hdfs dfs -copyToLocal hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_randomized_2024_01_26.txt/part-00000-131df78f-da7a-4ffc-aad4-9874342165ca-c000.txt ./queries_for_performance_randomized.txt 
$ sha1sum queries_for_performance.txt queries_for_performance_randomized.txt
$ # they're different
$ diff queries_for_performance.txt queries_for_performance_randomized.txt | wc -l
$ # they're very different
$ cp wdqs-split-test.yml wdqs-split-test-randomized.yml
$ # changed pointers to query file to be queries_for_performance_randomized.txt
$ bash start-iguana.sh wdqs-split-test-randomized.yml
$ mv result.nt result.nt.003

{F41720432}

Now a screenshot from the re-run of the randomized order queries, followed by a screenshot showing the two runs on the randomized order queries side by side.

Screenshot 2024-01-27 at 6.36.58 AM.png (1×3 px, 951 KB)

Screenshot 2024-01-27 at 6.38.45 AM.png (1×3 px, 1 MB)

Here were the data produced by IGUANA once piped through the CSV utility introduced in https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/3/diffs with a command of the following form (for the attentive reader, note that I had to rename the originally produced files to have an .nt extension to make the underlying Jena libraries not throw an exception).

/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar org.aksw.iguana.rp.analysis.TabularTransform -e result.003.nt > result.003.execution.csv

runendpointLabeltaskStartDatesuccessfullQueriessuccessfullQueriesQPHavgqpsqueryMixesPH
non-randomized 1baseline2024-01-25T22:18:57.753Z1553817512.44699053912332.605903113573460.9895715087607575
non-randomized 1wikidata_main_graph2024-01-25T23:19:56.948Z1677319125.48455582880733.861912971635051.0807190233276154
non-randomized 2baseline2024-01-26T01:47:41.634Z1589317955.60961825601832.972845132743411.0146131897076351
non-randomized 2wikidata_main_graph2024-01-26T02:48:41.047Z1678019145.81025444105834.0852093005915151.0818675625496446
randomized 1baseline2024-01-26T16:51:54.091Z1518017068.10762259918632.887863300449050.9644633340452725
randomized 1wikidata_main_graph2024-01-26T17:52:52.903Z1592917969.80930047701333.915600370681211.0154155676372838
randomized 2baseline2024-01-26T19:37:30.811Z1521117054.88235448593333.007109052298130.9637160170924978
randomized 2wikidata_main_graph2024-01-26T20:38:29.989Z1608418210.14223914954334.140203627154091.0289960015341326

Keep in mind that a delay between was introduced in the configuration for these "stress tests" (a "stress test" here means that the execution of the queries goes continuously for the specified time interval at its concurrency and delay spec). This was to more closely model what a somewhat busy, but not completely saturated, WDQS node might experience, although we should be mindful that the server specs are a bit different between these test servers and the WDQS hosts used for serving end user WDQS production requests. When interpreting a value like avgqps, remember that this is akin to what might happen if queries were executed serially without delay if it were possible to hold JVM performance constant for such request patterns (do note that this is generally not possible to guarantee, so caveats abound; in other words it's entirely possible that avgqps could degrade in reality).

The successfullQueriesQPH metric is probably the most interesting one. It's suggestive of about a 5%-6% speed advantage for the smaller "main" graph versus a fully populated "full" graph for this query mix when conditions model a somewhat busy WDQS node (again, remember that server spec is a bit different between the SUT and production nodes so there is a caveat). During the very first run the performance difference was more pronounced - closer to 10% - but subsequent runs were closer to the 5%-6% level for wikidata_main_graph versus baseline.

Additional basic summary statistics upon the data from with per-query CSV exports (using the -q flag) against the .nt files to come. Note that in Andrea's previous analysis these sorts of statistics (as well as some tweaks to get somewhat finer precision via BigDecimal instead of Double types) were incorporated directly into the Java source of IGUANA - see https://github.com/dice-group/IGUANA/compare/main...AndreaWesterinen:IGUANA:main for changes up to June 13, 2022 against current main branch of IGUANA; n.b., to future readers you may need to re-correlate the code changes when IGUANA upstream changes. But, I opted to make fewer changes to our fork (i.e., I didn't merge Andrea's fork into our fork, even if there is some dependency similarity in the POMs) as this data can be determined in Spark summary stat calls.

We may be interested in how to take forward some of the enhancement opportunities for IGUANA upstream (Andrea's fork and David's CSV utilities in our fork, for example) should we see the need for more IGUANA work later, but then again we may not do that as our needs are narrower.

Following below are "per-query" summary stats. I actually just put this together by bringing CSV data into Google Sheets for now - all of the columns are calculated upon the "per-query" rows (but you'll see how the Mean corresponds basically with the value calculated up above). The underlying CSV data don't bear actual queries (the .nt files from which they're generated do), but rather rows of this form:

endpointLabel,task,queryId,totalTime,success,failed,timeouts,resultSize,unknownException,wrongCodes,qps,penalizedQPS
baseline,http://iguana-benchmark.eu/resource/1706221131/1/1,http://iguana-benchmark.eu/resource/1989023647/sparql0,53.592,2,0,0,1,0,0,37.319002836244216,37.319002836244216

No big surprises here. The "per-query" behavior was similar between nodes. The "main" graph skewed somewhat faster over the full range of queries with one exception: the absolute fastest singular query for "randomized 1" run was slightly faster on the "baseline" full graph...but generally, everything else skewed faster for the "main" graph otherwise.

Per-query theoretical throughput (queries per second for given query)

RunEndpoint LabelMeanMedianStandard DeviationMax (fastest)99% (very fast)95%75%50%25%1% (pretty slow)Total w/ success
non-randomized 1baseline32.605903113573534.348916453796919.4337414434464120.23566189731876.855416143926155.784115505666849.122246901574734.348916453796914.31598878438450.056466331401281515538
non-randomized 1wikidata_main_graph33.861912971635135.719388484069120.2056740376445148.61049190072881.278906028389357.561908106487350.492299924261535.719388484069115.0988977804620.062518849826072816773
non-randomized 2baseline32.972845132743734.731869963878819.5908672232246128.89089385834874.714246512772656.141926790927449.740417203517934.731869963878814.66896724984490.056958193893089115893
non-randomized 2wikidata_main_graph34.085209300591436.10629693818620.1931723422722130.2592158395282.044956599897757.613975465266650.62522148534436.10629693818615.3789370078740.062230605986242216780
randomized 1baseline32.887863300448934.640432312595219.8757923913207136.07293509320979.278260846236656.216410737247849.492699826775534.640432312595214.17555001134040.055789521670722715180
randomized 1wikidata_main_graph33.915600370681435.709184402228220.2748013579631132.08294809140181.050165438149857.74739231295850.510152540660635.709184402228215.0796582949430.057433004848720215929
randomized 2baseline33.00710905229834.567090466120119.8511760316909133.90465988216381.401764997302856.133575496395349.617634112887634.567090466120114.31541879343420.053809022291745715211
randomized 2wikidata_main_graph34.140203627154136.095870632399620.2577595936201134.15615776764183.023931093462757.785098236302950.529294648680936.095870632399615.71228342583130.058977558459951216084

A run is in progress for 78K+ queries from a set of 100,000 random queries. It should be done in under 10 hours from now.

scala> val full_random = spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/full_random_classified.parquet")

scala> val wikidata_random = spark.read.parquet("hdfs:///user/dcausse/T352538_wdqs_graph_split_eval/wikidata_random_classified.parquet")

scala> full_random.count
res0: Long = 100000 

scala> wikidata_random.count
res6: Long = 100000  

scala> val joined11 = wikidata_random.as("w").join(full_random.as("f")).where("w.id = f.id and w.success = true and  w.success = f.success and w.resultSize = f.resultSize and w.reorderedHash = f.reorderedHash").select(concat(col("w.query"), lit("\n### BENCH DELIMITER ###"))).distinct.sample(withReplacement=false, fraction=1.0, seed=42)

scala> joined11.count
res0: Long = 78862

scala> joined11.repartition(1).write.option("compression", "none").text("queries_for_performance_2024_01_31.txt")

scala> :quit

$ hdfs dfs -copyToLocal hdfs://analytics-hadoop/user/dr0ptp4kt/queries_for_performance_2024_01_31.txt/part-00000-29c4e72d-800d-4148-b804-8e428ee71e9e-c000.txt ./queries_for_performance_file_renamed_randomized_2024_01_31.txt

$ bash start-iguana.sh wdqs-split-test-randomized-2024-01-31.yml

start-iguna.sh previously ran from stat1006, but this time around it's running from stat1008 in order to use more RAM for the larger query mix.

Here's the output from the latest run based upon a larger set of queries from a random sample of WDQS queries.

$ /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar org.aksw.iguana.rp.analysis.TabularTransform -e result.nt > result.execution.csv
$ cut -f1,3,5,6,7,9 -d"," result.execution.csv | sed 's/,/|/g'
endpointLabeltaskStartDatesuccessfullQueriessuccessfullQueriesQPHavgqpsqueryMixesPH
baseline2024-01-31T23:20:44.567Z319857136612.7124657561418.836704913110071.732300885924224
wikidata_main_graph2024-02-01T04:23:01.613Z331473147674.1223323952319.559301422988251.8725637484770261

Here's the screen capture from Grafana.

Screenshot 2024-02-01 at 10.17.28 AM.png (1×3 px, 1 MB)

The non-scholarly "main" wikidata_main_graph (wdqs1022) window completed more queries. During the end of its window, a larger garbage collection pass occurred on the baseline ("full") graph node (wdqs1024) - where the window had closed much earlier. This happened to coincide with a small window of temporary failing queries observed, which is aggregated across all three nodes, "main", "full", and "scholarly". Depending on how one looks at this this corresponds to maybe 1.5%-3.0% of the wikidata_main_graph's run time, so I wouldn't read too much into this.

To repeat the verbiage from the earlier runs...

Following below are "per-query" summary stats. I actually just put this together by bringing CSV data into Google Sheets for now - all of the columns are calculated upon the "per-query" rows (but you'll see how the Mean corresponds basically with the value calculated up above). The underlying CSV data don't bear actual queries (the .nt files from which they're generated do), ...

The CSV data were generated with the following command:
/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar org.aksw.iguana.rp.analysis.TabularTransform -q result.nt > result.query.csv

RunEndpoint LabelMeanMedianStandard DeviationMax (fastest)99% (very fast)0.950.750.50.251% (pretty slow)Total w/ success
randomized 1baseline18.836704913110114.699966340468916.3589173757083127.43317722769159.00947211596850.573439596133430.347033548767514.69996634046894.971643005689950319857
randomized 1wikidata_main_graph19.559301422988316.098285398713416.5098295290687121.14114962950958.961325648831751.042687254893531.75131103149216.09828539871345.372498263618780331473

Although the max and 99th percentile queries were just ever so slightly faster on the baseline "full" graph, more generally things were faster on the non-scholarly "main" graph. The performance difference is obvious but not dramatic.

Here's the content of wdqs-split-test-randomized-2024-01-31.yml, comments removed for brevity. The main difference in this configuration file from the earlier presented one is five hours allowed per graph, to accommodate a larger query mix, and the updated filename pointing to the larger query mix based on the set of queries from the random sample.

datasets:
  - name: "split"
connections:
  - name: "baseline"
    endpoint: "https://wdqs1022.eqiad.wmnet/sparql"
  - name: "wikidata_main_graph"
    endpoint: "https://wdqs1024.eqiad.wmnet/sparql"

tasks:
  - className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
    configuration:
      timeLimit: 18000000
      warmup:
        timeLimit: 30000
        workers:
          - threads: 4
            className: "SPARQLWorker"
            queriesFile: "queries_for_performance_file_renamed_randomized_2024_01_31.txt"
            timeOut: 5000
      queryHandler:
        className: "DelimInstancesQueryHandler"
        configuration:
          delim: "### BENCH DELIMITER ###"
      workers:
        - threads: 4
          className: "SPARQLWorker"
          queriesFile: "queries_for_performance_file_renamed_randomized_2024_01_31.txt"
          timeOut: 60000
          parameterName: "query"
          gaussianLatency: 100

metrics:
  - className: "QMPH"
  - className: "QPS"
  - className: "NoQPH"
  - className: "AvgQPS"
  - className: "NoQ"

storages:
  - className: "NTFileStorage"
    configuration:
      fileName: result.nt

@dr0ptp4kt thanks! is the difference in the number of successful queries only explained by the improvement in query time or are there some improvements in the number of queries that timeout as well?

@dr0ptp4kt thanks! is the difference in the number of successful queries only explained by the improvement in query time or are there some improvements in the number of queries that timeout as well?

Good question! It appears to be related to query time.

Looking at this latest run, for example, there were no recorded timeouts according to the CSV of the IGUANA .nt (🤔).

Taking things on a head-to-head basis for identical queries between the endpoints, here's what we see for the difference in speed for wikidata_main_graph minus baseline. It's unsurprising in a way given the distribution shown in the prior Phabricator comment, but it is another way of knowing that, under this parameters of this test anyway (a controlled flow and not maximum load test), about 70% of the queries noted as successful seemed to be faster when run against the wikidata_main_graph. Note that about 16% of the queries hit wrongCodes / failed, which are discussed after the table.

Per-query wikidata_main_graph QPS minus baseline QPSdescriptor
0.722596509877809average
0.244672300065055median
79.4339558877256100% max (i.e., wikidata_main_graph's biggest winner)
21.065464102479199%
6.8808053334306795%
1.3841447331297275%
0.24467230006505550%
0.01398288136844742%
041%
026%
-0.0070111750239023125%
-0.21537462899898320%
-0.59865893161319515%
-1.4186739998926510%
-4.161523160768975%
-18.00684295935041%
-80.28001612662530% min (i.e., baseline's biggest winner)

About 58% of queries titled toward wikidata_main_graph, and about 25% tilted toward baseline, and 58/(58+25) is about 0.7. The stuff where the difference is negligible probably don't matter that much. Yet, there's a bit more detail to consider in IGUANA's conception here...

For the sake of completeness, and because this may be interesting to consider later on or to contextualize the QPS distributions in the prior Phabricator comment: looking at a different class of issues, let's suppose that we use wrongCodesas a proxy for things that could have gone wrong.wrongCodes and failed map to each other in the CSV, and their QPSes land as 0 for these (penalizedQPS, not included in the tables above, lands by default as 0.01666666667 for these records, but this is close enough to 0 if we wanted to look at it that way). These sorts of records thus drive down summary mean, median, and so on. As an aside, in terms of actual time (totalTime), these wrongCodes ones occupy very little time.

Endpoint Labelcount wrongCodessum wrongCodescount failedsum failedcount timeoutcount QPS < 1.0count QPS < 5.0count QPS < 20.0count QPS < 80.0count QPS < 200.0
baseline1278461863127846186301297319750466677864978710
wikidata_main_graph1278664129127866412901293418909448487864278710

The wrongCodes resulted for basically the same queries across both endpoints (and each row of the CSVs is a per-query row, as we know).

The ratio of wikidata_main_graph to baseline for sum wrongCodes (1.036629326091525) is very close to the ratio up above for successfullQueries (1.036316228814752).

I summarized at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Graph_split_IGUANA_performance . When we have a mailing list post during the next week or so, we'll want to move this to be a subpage of the target page of the post.

I actually just added a link to https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update#See_also . Marking this here ticket as resolved after noticing it was still open.