Page MenuHomePhabricator

Make it possible to quickly and programmatically pool and depool application servers
Closed, ResolvedPublic

Description

HHVM takes more time to initialize than PHP5, because it needs to translate PHP to byte code, analyze the types flowing through the code, and then compile the byte code down to machine code. This process takes a few minutes, but during that time the application server is too slow to serve user requests.

Because this is the case, it is essential to avoid a cold restart of the entire production cluster. We need to have a staggered deployment process that allows us to take servers offline while they are warming up and then bring them back online once they're ready.

To meet these requirements, we need better automation. The current process for adding and removing servers from the pool serving user requests is too slow and too manual.

Details

Reference
bz71212

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:51 AM
bzimport set Reference to bz71212.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 71211 has been marked as a duplicate of this bug. ***

Adding hhvm keyword. This is not directly related to the HHVM code base but it is an important component for full cluster HHVM deployment.

This is a long-standing issue we encounter for a number of reasons: we need an orchestration tool different from "vi".

So to achieve this specific goal we have two ways, the quick one and the correct one.

The quick one:

  • Create a file on the local disk of each machine with a predefined content ("OK"), and make apache serve it via some url like "/is-this-pooled"; then modify this file when we want to actually depool a server

The correct one:

  • Set up a distributed, consistent configuration database like Etcd or Zookeeper (my preference goes strongly to the first one, and I think there is some consensus in ops about this).
  • Make pybal fetch its config from this database, and watch the database for changes
  • Make scap/puppet/whatever work with it (for puppet, the best option is to use etcd as a secondary hiera backend just for orchestrated functionality)

I think the effort for this is really different, and while the latter solution will be implemented, I'm not sure it's advisable to do it now if it's going to be an hhvm blocker.

I'll add to this bug people that I think are relevant.

Anyone willing to mentor this for the upcoming GSoC/Outreachy round?

Qgil renamed this task from Make it possible to quickly and programmaticly pool and depool application servers to Make it possible to quickly and programmatically pool and depool application servers.Mar 12 2015, 10:49 PM

It is time to promote Wikimedia-Hackathon-2015 activities in the program (training sessions and meetings) and main wiki page (hacking projects and other ongoing activities). Follow the instructions, please. If you have questions, about this message, ask here.

Did someone work on this project during Wikimedia-Hackathon-2015? If so, please update the task with the results. If not, please remove the label.

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

Much of the infrastructure that's needed for this is in place. I'm not sure it's really a good outreachy type of project though. It requires quite a bit of involvement from SRE and Release-Engineering-Team, which is currently ongoing (see Scap)

Indeed, Ops and RelEng intend to work together on this in the upcoming quarter, using scap3.

Actually, I think that depends on this. It asks for an API to use, and this task proposes one (which as you said, is mostly in place already w/ conftool)