Page MenuHomePhabricator

Make it possible to quickly and programmatically pool and depool application servers
Closed, ResolvedPublic

Description

HHVM takes more time to initialize than PHP5, because it needs to translate PHP to byte code, analyze the types flowing through the code, and then compile the byte code down to machine code. This process takes a few minutes, but during that time the application server is too slow to serve user requests.

Because this is the case, it is essential to avoid a cold restart of the entire production cluster. We need to have a staggered deployment process that allows us to take servers offline while they are warming up and then bring them back online once they're ready.

To meet these requirements, we need better automation. The current process for adding and removing servers from the pool serving user requests is too slow and too manual.

Details

Reference
bz71212

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:51 AM
bzimport set Reference to bz71212.
bzimport added a subscriber: Unknown Object (MLST).
ori created this task.Sep 24 2014, 12:18 AM
ori added a comment.Sep 24 2014, 12:20 AM
  • Bug 71211 has been marked as a duplicate of this bug. ***

Adding hhvm keyword. This is not directly related to the HHVM code base but it is an important component for full cluster HHVM deployment.

Joe added a comment.Sep 24 2014, 6:00 AM

This is a long-standing issue we encounter for a number of reasons: we need an orchestration tool different from "vi".

So to achieve this specific goal we have two ways, the quick one and the correct one.

The quick one:

  • Create a file on the local disk of each machine with a predefined content ("OK"), and make apache serve it via some url like "/is-this-pooled"; then modify this file when we want to actually depool a server

The correct one:

  • Set up a distributed, consistent configuration database like Etcd or Zookeeper (my preference goes strongly to the first one, and I think there is some consensus in ops about this).
  • Make pybal fetch its config from this database, and watch the database for changes
  • Make scap/puppet/whatever work with it (for puppet, the best option is to use etcd as a secondary hiera backend just for orchestrated functionality)

I think the effort for this is really different, and while the latter solution will be implemented, I'm not sure it's advisable to do it now if it's going to be an hhvm blocker.

I'll add to this bug people that I think are relevant.

Qgil added a subscriber: Qgil.
ori set Security to None.

Anyone willing to mentor this for the upcoming GSoC/Outreachy round?

greg updated the task description. (Show Details)Mar 5 2015, 6:44 PM
greg added subscribers: thcipriani, demon.
Qgil renamed this task from Make it possible to quickly and programmaticly pool and depool application servers to Make it possible to quickly and programmatically pool and depool application servers.Mar 12 2015, 10:49 PM
Qgil added a comment.May 18 2015, 11:12 AM

It is time to promote Wikimedia-Hackathon-2015 activities in the program (training sessions and meetings) and main wiki page (hacking projects and other ongoing activities). Follow the instructions, please. If you have questions, about this message, ask here.

Qgil added a comment.May 27 2015, 10:23 PM

Did someone work on this project during Wikimedia-Hackathon-2015? If so, please update the task with the results. If not, please remove the label.

Qgil added a comment.Sep 23 2015, 9:07 AM

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptSep 23 2015, 9:07 AM
Qgil added a comment.Sep 23 2015, 9:35 AM

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

Much of the infrastructure that's needed for this is in place. I'm not sure it's really a good outreachy type of project though. It requires quite a bit of involvement from Operations and Release-Engineering-Team, which is currently ongoing (see Scap)

mark added a comment.Sep 23 2015, 3:08 PM

Indeed, Ops and RelEng intend to work together on this in the upcoming quarter, using scap3.

Probably duplicate of T104352

demon added a comment.Sep 23 2015, 3:16 PM

Actually, I think that depends on this. It asks for an API to use, and this task proposes one (which as you said, is mostly in place already w/ conftool)

Niharika removed a subscriber: Niharika.Dec 11 2015, 2:36 AM
Joe claimed this task.Jan 13 2016, 7:44 AM
Joe closed this task as Resolved.Feb 2 2016, 9:51 AM