Page MenuHomePhabricator

[EPIC] Performance testing environment
Open, LowPublic

Description

The beta cluster is currently unsuitable for testing code for many performance problems, because it is all on VMs. It generally won't tell you about cachebusting either, for several of our caching layers.

Several people are working on Vagrant roles that will help developers test in more realistic environments in dev environments, with million-row databases and warm caches. But CPU and other constraints won't be realistic.

So - as one can see in https://www.mediawiki.org/wiki/Performance_profiling_for_Wikimedia_code - right now, many interactions we can only predict roughly until the code hits production.

It is not possible to have a testing cluster that exactly mirrors production. And of course it is the developer's responsibility to know the constraints of the production system and know how her code exercises those systems. And with heterogeneous deployment, it's *possible* to notice some problems while they're only affecting less-trafficed wikis. But some people have expressed interest in creating a more realistic environment to test/predict how efficient code will be when deployed to very-high-traffic wikis, especially with unique configurations.


As a service developer I want to be more confident that my service will stand up to the load I expect to have placed upon it.

Currently this seems to be done in an ad-hoc manner and in some cases simply skipped. For example recently we tested the termbox service with some custom Locust scripts. Without a place to run these form it ended up being done from a developer laptop making the test hard to reproduce and increasing the barrier to entry.

It would be awesome to have a service that:

  • Can generate traffic to my service
  • It possible to configure some range of request content/parameters to fully exercise it
  • can be run near the service in question to make latency realistic

It would be even cooler but non-essential if it could:

  • have a feature for recording and replying prod traffic at various speeds
  • show pretty statistics for the testing

Perhaps this could be done by containerising an off the shelf load-testing solution and providing a chart for running.

Details

Reference
bz65394

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:16 AM
bzimport set Reference to bz65394.
bzimport added a subscriber: Unknown Object (MLST).

CCing Aaron and Ori which are the performance engineers. I know that at least Ori raised the subject previously in previous bug report.

Maybe we can investigate that after HHVM migration which takes a good share of our productivity this quarter and probably the next as well.

does it make sense to bring this up in the "scrum of scrums"?

(In reply to Daniel Zahn from comment #3)

does it make sense to bring this up in the "scrum of scrums"?

We can, but I think a larger conversation needs to happen between Platform/RelEng/Ops re support/team capacity. It won't be on our (RelEng's) todo list for this quarter, at least.

hashar changed the task status from Open to Stalled.May 29 2015, 10:32 AM
hashar subscribed.
In T282#1262238, @greg wrote:

Setting to Stalled, it's probably something that will come up again, but you're right, not on the plan for now.

greg renamed this task from performance testing environment to [EPIC] Performance testing environment.Sep 24 2015, 1:36 AM
greg updated the task description. (Show Details)
Aklapper changed the task status from Stalled to Open.EditedMay 25 2020, 3:10 PM
Aklapper subscribed.

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)

You are free to set this task to stalled status again, however tasks should not remain stalled for five years, hence I boldly reopened for the time being.