Page MenuHomePhabricator

Projections of cost and scaling for pageview API. {hawk} [8 pts]
Closed, ResolvedPublic

Description

Projections of cost and scaling for pageview API.

We need to know how much hardware we need to run the Pageview API for the foreseeable future to know whether we need to tap into the analytics team budget for the upcoming quarters.

Event Timeline

Nuria created this task.Oct 20 2015, 10:50 PM
Nuria raised the priority of this task from to Needs Triage.
Nuria updated the task description. (Show Details)
Nuria added a subscriber: Nuria.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 20 2015, 10:50 PM
Nuria set Security to None.Oct 20 2015, 10:50 PM
Nuria added a subscriber: ori.
kevinator added a subscriber: kevinator.EditedOct 21 2015, 1:33 AM

I think we should look at a 5 year horizon. How much will we need to spend when the hardware goes out of warranty and needs to be replaced? Keep in mind that the Cassandra DBs are running on old hardware, so they may need to be replaced sooner.

Milimetric triaged this task as Normal priority.Oct 21 2015, 4:49 PM
Milimetric moved this task from Incoming to Prioritized on the Analytics-Backlog board.
Milimetric renamed this task from Projections of cost and scaling for pageview API. to Projections of cost and scaling for pageview API. {hawk} [8 pts].Oct 21 2015, 4:55 PM
Milimetric moved this task from Prioritized to Tasked on the Analytics-Backlog board.
Nuria raised the priority of this task from Normal to High.Oct 22 2015, 7:24 PM

Had a talk with @GWicke about cassandra read latency problem (disk seek takes long time on rotating disks).
He suggests we replace the rotating disks with SSDs (that's what the services team has done already).
They use "Samsung 850 pro 1T SSD" disks that cost about 425$ per unit (http://www.amazon.com/dp/B00LF10KTE/ref=asc_df_B00LF10KTE31394483/?tag=googshopfr-21&creative=22686&creativeASIN=B00LF10KTE&linkCode=df0&hvdev=c&hvnetw=g&hvqmt=).

In term of storage, the compaction strategy currently in use is doing job at not taking all CPUs while maintaining disk space to a "not too big" state.
We currently have data from beginning of August to mid-January (5.5 month), and we use a bit less than 1.6T per machine (keeping in mind that we want to backfill July).
Using linear projection (assuming no other big dataset gets added to the api), we'll use about 3T of storage at the beginning of next July.

With an investment of 6 disks * 3 machines * $425 = $7650, we would reduce read latency issue for the next year and a half coming.

Let's talk about that !

Interesting. I suppose we could fit those SSDs into the nodes we currently use, but as Kevin notes, they are soon going to be out of warrantee.

I think it'll be very difficult to project this out for 5 years. We have no idea about other types of datasets that we will serve via AQS. Projecting for Pageviews + a little more over the next year or so might be the best we can do.

so if we need 3T per year, we'll naively need 15T for 5 years. But we shouldn't keep daily per-article resolution for that long. We could cut it dramatically by going down to weekly after one year. I'm not sure what the best thing to do would be, but I'm thinking it might be a good idea to explain the cost to the different people who might need daily or hourly per-article resolution for longer periods of time. They want it, but what if it costs $20,000 of donor money, would they still want it?

Milimetric moved this task from In Progress to Done on the Analytics-Kanban board.
Nuria closed this task as Resolved.Feb 8 2016, 5:45 PM