Page MenuHomePhabricator

Estimate ORES CapEx for FY21
Closed, ResolvedPublic

Description

  1. Estimate how much memory we needed to expand models this year and extrapolate to next year
  2. Estimate how much memory we need for each new topic model and estimate how much memory we'll need to serve 15 new wiki/models (for Growth)

Event Timeline

Running a test today with uwsgi. I'm running 96 uwsgi workers on ores-misc-01 to mimic what we have in production.

When running all 96 workers (no celery), we have 30.4GB available memory. Because of our uwsgi bug, that falls to 13.4GB when we start the shutdown process. It goes back up to 31.1GB when uwsgi is not running. So uwsgi * 96 workers only consumes about 0.6GB of memory.

Now running a test with celery. I'm running 90 celery workers on ores-misc-01 to mimic what we have in production.

When running all 90 workers (no uwsgi), we have 16.8GB available memory. Top reports about 2.2GB of RES for the main process. It looks like we're using 14.3GB to load all of the models and the worker processes we need.


Now the test for how much memory we consume while running both. 16.1GB free. That means we're using about 15GB of memory to run all of our workers for both uwsgi and celery.

See https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?panelId=24&fullscreen&orgId=1&from=1581002497683&to=1581016730349

Now in production, when we're actually serving traffic. Things are a bit different. When we are doing a restart of our workers, available memory goes up to 47-58GB and after running for a few hours, available memory goes down to 20GB. I think this is largely because we consume memory to gather data for feature extraction. Each uwsgi worker pulls a bunch of data from the MediaWiki APIs. Eventually each worker builds a buffer of memory it needs to process requests. This means that at full tilt, we are using 27-38GB of memory. I do not expect that adding new models affects this overhead. So if my estimates are right, we need at most 23GB of overhead beyond what it takes to start the workers.


OK so how much memory do we need to host a single topic model. I'm going to experiment with removing 4/5 topic models and see how the memory usage changes. Then we can do some math to figure out what the average topic model requires.

Before loading the models, we have 31.1GB free. After loading the models, we have 22.0 GB free. That means we used about 9.1GB. That's a 4.9GB difference. 4.9GB/4 = 1.2GB per topic model! That's a bit more than I expected given that the models and vectors serialize out to about 100MB!

Right now, we need 15GB to run all of our models + ~23GB of overhead for buffers for the workers. Product wants us to host ~15 more models. 15 * 1.2 = 18GB more model memory needed.

15 + 18 + 23 = 56GB. We need 56GB to run our models. And our production machines have 58GB free when we are not running anything (mid restart).

So now, if we were to increase capacity, what would that look like? Well, we want at least 15% breathing room. Under the tightest of conditions, we need 56GB * 0.85 = 66GB available. Right now, we have 64GB but we only have 58GB available. Let's take the worst case that 64-58 = 6GB is used by other stuff on the machine. That means we need 6 + 66 = 72GB total.

Let's say we actually want twice that many topic models. That would require another 18GB of memory. We'd use 15 + 18 + 18 + 23 = 74GB 74 + 6 = 80GB of total memory usage. If we aim for 85% utilization, that means we'd need 94GB of memory on the servers.

I see two proposals that are interesting here:

  1. Let's just buy bigger servers. If we get servers that have ~128GB of RAM, we'd be able to host a heck of a lot of topic models for product and that would give us room to grow in interesting ways.
  2. Let's add more servers and split the web workers away from the celery workers. We can host *a lot* more web workers than celery workers on a smaller number of nodes.

For (2), we'd need to do a bit more work to confirm where the 23GB of overhead we observe is coming from. Is it celery workers or web workers? We'd need to experiment with selectively shutting down celery/uwsgi and seeing what happens to memory usage under normal load. @akosiaris, what do you think of trying this out on a production ores node?

We'd need to experiment with selectively shutting down celery/uwsgi and seeing what happens to memory usage under normal load. @akosiaris, what do you think of trying this out on a production ores node?

Sorry for the late answer. Shutting down uswgi or celery for a period of time? @Halfak, yeah we can do that.

@Halfak I do like the idea of adding more servers to split the workers, though I'm wondering if that would limit us in terms of growth/hosting more types of models/etc. with the smaller number of nodes. The bigger servers option would definitely help in that regard, would it make sense to maybe do both to some extent?

I'm not quite sure how much headspace we would get by moving the uwsgi workers off servers that celery is running. I think it'll gain us a much smaller amount than doubling the RAM. It would really be a stop-gap. I think 3 uwsgi servers could serve 9 celery servers easily. That's what we have seen in wmflabs.

Just based on rough estimates in wmflabs. I expect to get 5GB of memory back from moving uwsgi. That would get us 2-3 topic models. I think this approach only really makes sense if we can't get new hardware.

Oh! Of course, we could also do some routing with celery. E.g. we can probably load up a limited set of models on some nodes. I'm not sure how complicated that would be. It's definitely more engineering and debugging that adding RAM.

@akosiaris, anything else you need from us to consider option #1 (buy bigger servers) as part of CapEx proposals for the next cycle?

@akosiaris, anything else you need from us to consider option #1 (buy bigger servers) as part of CapEx proposals for the next cycle?

I 'll have to ask, I am not sure we can put in the CapEx budget that option. We may need some other way of achieving the same thing.

@akosiaris, anything else you need from us to consider option #1 (buy bigger servers) as part of CapEx proposals for the next cycle?

I 'll have to ask, I am not sure we can put in the CapEx budget that option. We may need some other way of achieving the same thing.

@Halfak, the ORES machines are under warranty so no way they are being replaced, but it looks like a possible solution would be to add another 64GB RAM to each box. We 'll try and add that to the budget.