Page MenuHomePhabricator

OpenSearch on K8s: Benchmark cluster
Open, Needs TriagePublic

Description

Before we can offer OpenSearch on K8s instances in production, we need to have some idea of the level of performance to expect (see also T396501, which is scoped to existing, non-K8s OpenSearch clusters)

Creating this ticket to:

Event Timeline

It looks like OpenSearch publish their own benchmarking tool: https://docs.opensearch.org/latest/benchmark/quickstart/

We should be able to install it in a conda environment on a stat server, then run it against https://opensearch-test.discovery.wmnet:30443

It looks like OpenSearch publish their own benchmarking tool: https://docs.opensearch.org/latest/benchmark/quickstart/

We should be able to install it in a conda environment on a stat server, then run it against https://opensearch-test.discovery.wmnet:30443

Updated endpoint is https://opensearch-ipoid.discovery.wmnet:30443/opensearch_ipoid/_search

It looks like OpenSearch publish their own benchmarking tool: https://docs.opensearch.org/latest/benchmark/quickstart/

We should be able to install it in a conda environment on a stat server, then run it against https://opensearch-test.discovery.wmnet:30443

Updated endpoint is https://opensearch-ipoid.discovery.wmnet:30443/opensearch_ipoid/_search

Although we should be careful not to modify data in that index, so maybe opensearch-test is better.

Anecdotally, I see took times between 20-835ms when querying for IPs and IP ranges.

@RKemper and I attempted to use opensearch-benchmark today. It wasn't available in conda, so I installed w/pip in a virtualenv on stat1011.

We tried to run a few benchmarks against our opensearch-test instance but we kept getting generic python errors:

opensearch-benchmark run --pipeline=benchmark-only --workload=http_logs --target-host=$HOST --client-options=basic_auth_user:opensearch,basic_auth_password:$PW

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] [Test Run ID]: 4206edd7-82cb-47fc-affd-258c2f71fbee
[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[WARNING] Could not determine distribution version from endpoint, use --distribution-version to specify
[ERROR] ❌  Cannot run. Error in test run orchestrator (expected string or bytes-like object)

Getting further help:
*********************
* Check the log files in /home/bking/.osb/logs for errors.
* Read the documentation at https://opensearch.org/docs.
* Ask a question on the forum at https://forum.opensearch.org/.
* Raise an issue at https://github.com/opensearch-project/OpenSearch-Benchmark/issues and include the log files in /home/bking/.osb/logs.

We're out of time for today, but will take a look tomorrow.

@RKemper and I attempted to use opensearch-benchmark today. It wasn't available in conda, so I installed w/pip in a virtualenv on stat1011.

We tried to run a few benchmarks against our opensearch-test instance but we kept getting generic python errors:

opensearch-benchmark run --pipeline=benchmark-only --workload=http_logs --target-host=$HOST --client-options=basic_auth_user:opensearch,basic_auth_password:$PW

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] [Test Run ID]: 4206edd7-82cb-47fc-affd-258c2f71fbee
[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[WARNING] Could not determine distribution version from endpoint, use --distribution-version to specify
[ERROR] ❌  Cannot run. Error in test run orchestrator (expected string or bytes-like object)

Getting further help:
*********************
* Check the log files in /home/bking/.osb/logs for errors.
* Read the documentation at https://opensearch.org/docs.
* Ask a question on the forum at https://forum.opensearch.org/.
* Raise an issue at https://github.com/opensearch-project/OpenSearch-Benchmark/issues and include the log files in /home/bking/.osb/logs.

We're out of time for today, but will take a look tomorrow.

You might need to ignore SSL certificate errors and/or specify the exact OpenSearch version (via --distribution-version)

@kostajh Thanks! I did indeed need to ignore certificate errors. I'm running a benchmark with the http_logs workload and will update the ticket when I have more info. Looking at the dashboard, I'm not seeing any signs of stress on the cluster so far.