[EPIC] Performance testing environment
Open, LowPublic
Actions

Assigned To

None

Authored By

	• bzimport
	May 16 2014, 1:38 PM

Description

The beta cluster is currently unsuitable for testing code for many performance problems, because it is all on VMs. It generally won't tell you about cachebusting either, for several of our caching layers.

Several people are working on Vagrant roles that will help developers test in more realistic environments in dev environments, with million-row databases and warm caches. But CPU and other constraints won't be realistic.

So - as one can see in https://www.mediawiki.org/wiki/Performance_profiling_for_Wikimedia_code - right now, many interactions we can only predict roughly until the code hits production.

It is not possible to have a testing cluster that exactly mirrors production. And of course it is the developer's responsibility to know the constraints of the production system and know how her code exercises those systems. And with heterogeneous deployment, it's *possible* to notice some problems while they're only affecting less-trafficed wikis. But some people have expressed interest in creating a more realistic environment to test/predict how efficient code will be when deployed to very-high-traffic wikis, especially with unique configurations.

As a service developer I want to be more confident that my service will stand up to the load I expect to have placed upon it.

Currently this seems to be done in an ad-hoc manner and in some cases simply skipped. For example recently we tested the termbox service with some custom Locust scripts. Without a place to run these form it ended up being done from a developer laptop making the test hard to reproduce and increasing the barrier to entry.

It would be awesome to have a service that:

Can generate traffic to my service
It possible to configure some range of request content/parameters to fully exercise it
can be run near the service in question to make latency realistic

It would be even cooler but non-essential if it could:

have a feature for recording and replying prod traffic at various speeds
show pretty statistics for the testing

Perhaps this could be done by containerising an off the shelf load-testing solution and providing a chart for running.

Details

Reference: bz65394

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		Krinkle	T59137 Add a voting YSlow job to Jenkins
		Open		None	T67394 [EPIC] Performance testing environment

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 3:16 AM

• bzimport added projects: Deployments, acl*sre-team.

• bzimport set Reference to bz65394.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.May 16 2014, 1:38 PM

CCing Aaron and Ori which are the performance engineers. I know that at least Ori raised the subject previously in previous bug report.

Maybe we can investigate that after HHVM migration which takes a good share of our productivity this quarter and probably the next as well.

Added to https://www.mediawiki.org/wiki/Wikimedia_Release_and_QA_Team/Wishlist

does it make sense to bring this up in the "scrum of scrums"?

(In reply to Daniel Zahn from comment #3)

does it make sense to bring this up in the "scrum of scrums"?

We can, but I think a larger conversation needs to happen between Platform/RelEng/Ops re support/team capacity. It won't be on our (RelEng's) todo list for this quarter, at least.

hashar added a project: Performance Issue.Nov 24 2014, 11:00 AM

hashar set Security to None.

hashar unsubscribed.

greg edited projects, added Release-Engineering-Team; removed Deployments.Jan 8 2015, 6:03 PM

In T282#1262238, @greg wrote:

Setting to Stalled, it's probably something that will come up again, but you're right, not on the plan for now.

greg renamed this task from performance testing environment to [EPIC] Performance testing environment.Sep 24 2015, 1:36 AM

greg updated the task description. (Show Details)

greg edited projects, added Release-Engineering-Epics; removed Release-Engineering-Team.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 24 2015, 1:36 AM

greg added a subscriber: Jdlrobson.Sep 24 2015, 1:36 AM

greg mentioned this in T112587: [GOAL] Have a way to detect performance regressions to mobile site.Sep 24 2015, 1:39 AM

Peter subscribed.Sep 24 2015, 9:50 AM

greg added a project: Epic.Mar 11 2016, 10:07 PM

greg edited projects, added Release-Engineering-Team; removed Release-Engineering-Epics.Mar 11 2016, 10:08 PM

greg moved this task from INBOX to Epics (ARCHIVED) on the Release-Engineering-Team board.Mar 11 2016, 10:09 PM

Dzahn unsubscribed.Mar 11 2016, 11:10 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 7:50 PM

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team.Jun 12 2019, 11:41 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Epics on the Release-Engineering-Team-TODO board.Jun 12 2019, 11:41 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks... → Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)

You are free to set this task to stalled status again, however tasks should not remain stalled for five years, hence I boldly reopened for the time being.

thcipriani edited projects, added Release-Engineering-Team (thcipriani-workboard-fiddling); removed Release-Engineering-Team, Release-Engineering-Team-TODO.Apr 20 2021, 3:44 AM

thcipriani moved this task from thcipriani-workboard-fiddling to Seen (ARCHIVE) on the Release-Engineering-Team board.Apr 20 2021, 4:05 AM

thcipriani edited projects, added Release-Engineering-Team; removed Release-Engineering-Team (thcipriani-workboard-fiddling).

thcipriani edited projects, added Release-Engineering-Team (Seen); removed Release-Engineering-Team.Apr 20 2021, 3:24 PM

hashar merged a task: T230530: Offer Loadtesting as a Service.Jun 8 2021, 3:20 PM

hashar updated the task description. (Show Details)

hashar mentioned this in T230530: Offer Loadtesting as a Service.

hashar added subscribers: Tarrow, CDanis, WMDE-leszek, Jakob_WMDE.

jbond edited projects, added Beta-Cluster-Infrastructure; removed SRE.May 23 2023, 3:52 PM

hashar unsubscribed.May 24 2023, 8:30 AM