To find the limitations of the current production service, we will perform load tests to verify number of requests per second we can safely serve on LiftWing.
We are doing this, because the service went through a couple of implementation changes in the past months. Additionally, previous load tests were performed against staging and findings suggest that the previous load test results might have been unreliable due to the load test configuraiton.
Current state and assumptions for load tests:
- I'm testing against internal production endpoint https://inference.svc.eqiad.wmnet:30443/v1/models/outlink-topic-model:predict.
- I'm using page_id and lang parameters, which offer the best performance.
- Current production deployment is scaling up to 5 replicas, the tests will reflect performance of the current 5 replica setup.
- I'm using set of ~6000 unique page_id values for enwiki during testing.