Page MenuHomePhabricator

Hardware for cloud db replicas for analytics usage
Closed, ResolvedPublic

Description

Given recent issues with performance and mediawiki refactors on the cloud replicas it is been decided that we will move analytics jobs to a new set of labs replicas.
@Nuria wrongly though analytics had spare hosts we could use ready to be racked but we do not. We however have some budget to buy these hosts if needed. This ticket is to decide how the hardware comes to appear:

  1. does analytics new to order the new hardware for the replicas? If so please let us know
  1. does the SRE/DB team have any hardware we can use? I imagine this is a question for @mark

Event Timeline

as far as I know we don't have any spare hardware, but I am curious too

We have some hardware orders in progress for db-like hosts, we might be able use those if nothing else is available.

We don't have any spare host similar to the labsdb ones.
Those are very specific hardware, as they need to contain all the wikis on the same host, they have lots of disks (16x1.6TB on a RAID10) and 512GB memory T135529

We don't have any spare host similar to the labsdb ones.
Those are very specific hardware, as they need to contain all the wikis on the same host, they have lots of disks (16x1.6TB on a RAID10) and 512GB memory T135529

How many labsdb-like hosts would we need? I checked the cost of them in T135529 and we'd be way out of our budget :(

That is up to you guys to decide :-)
For context: 1 single host can have all the wikis - we have 3 in labs for redundancy and because we separate the service for "quick" and "slow" queries.

@Marostegui Let's have in mind these boxes are going to be used to scoop data once a month so they likely not need to be as beefy as boxes used every day is labs by hundreds of users right? This means that while the box might be of the same "type" it can probably have much less beefy requirements

@Marostegui Let's have in mind these boxes are going to be used to scoop data once a month so they likely not need to be as beefy as boxes used every day is labs by hundreds of users right? This means that while the box might be of the same "type" it can probably have much less beefy requirements

Sure - as I said, it is for you guys to decide I was just giving some context on what we have on labs. :-)
You still need the same amount of disk space on a single server - as you need to have all the wikis on one single host:

root@labsdb1009:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs    12T  8.8T  2.9T  76% /srv

The amount of memory (512GB) is for you guys to decide if you want to go for maybe half or just 512GB (if performance is key, I guess you want 512GB).
Whether you want more than one host (for redundancy) is also for you guys to decide.

A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 12x3.6TB disks and 128GB of RAM, so I'd say that they could do the job (I only don't know if 128GB of RAM would be enough for our use case, but I'll defer to Manuel/Balazs/Jaime judgement).
In case this option is viable, we'll need to get also the green light for the repurpose from Faidon or Mark.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.

A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 12x3.6TB disks and 128GB of RAM, so I'd say that they could do the job (I only don't know if 128GB of RAM would be enough for our use case, but I'll defer to Manuel/Balazs/Jaime judgement).

Are those SSDs?
I am not really sure 128GB would be enough if performance is already an issue with 512GB (although with different views maybe peformance would improve a bit)
If this process runs once a month, is it a concern if the query takes a lot more?

A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 12x3.6TB disks and 128GB of RAM, so I'd say that they could do the job (I only don't know if 128GB of RAM would be enough for our use case, but I'll defer to Manuel/Balazs/Jaime judgement).

Are those SSDs?

They are not, forgot to mention :(

I am not really sure 128GB would be enough if performance is already an issue with 512GB (although with different views maybe peformance would improve a bit)
If this process runs once a month, is it a concern if the query takes a lot more?

I was under the impression that part of the 512G requirement was due to the large amount of users hitting the labsdb replica, if the queries take a lot more than now it might be a problem for us..

128GB + non SSDs disks will definitely not work for heavy queries :-(

@Marostegui, I think these nodes will not have any 'heavy queries'. There won't be any huge joins etc. (unless I don't fully understand the materialized view stuff). These will be used for sqoop, which is really just a glorified mysqldump.

@Marostegui, I think these nodes will not have any 'heavy queries'. There won't be any huge joins etc. (unless I don't fully understand the materialized view stuff). These will be used for sqoop, which is really just a glorified mysqldump.

I don't think they could even cope with replication as they would need to replicate all 8 sections from production (same as labsdbs)

The queries that we'll be running on here will be of the form:

select [subset of fields] from [one of the tables listed below] where [timestamp < the first day of this month (if there's a timestamp)]

We import the following tables:

  • archive
  • ipblocks
  • logging
  • page
  • pagelinks
  • redirect
  • revision
  • user
  • user_groups

And from any materialized views created to make it faster to grab actor/comment joins to archive/revision/logging.

Obviously SSDs would be better, but these should all be contiguous reads on disk, requiring not too much memory, because no joins are being done in real-time. I think the biggest memory user will be refreshing materialized views, which will probably join actor/comment to archive/revision/logging. From a speed point of view, here's what we need:

  • 1st of the month: kick off any materialized view refreshes
  • when refreshes are done, import data
  • when import is done, kick off jobs which take at most 1 day

Before the performance problems, this process started on the 5th of the month and it generally finished by the 10th of the month. So if we had the same performance, we'd finish around the 5th of the month. If these boxes are half as fast at performing these tasks, we'd finish on the 10th. I'd say if they're close to half as fast or worse, we should get faster boxes. If not, we're good for now as a temporary solution.

@Milimetric I think T210749#4795213 should go to T210693 as you are explaining what you need and kinda drafting an SQL.

The problem with the current proposed servers is that they have very little amount of memory and no SSDs, that probably makes them impossible to catch up with replication for all the wikis at the same time.
To give you an example, the current backups hosts we have are similar to those.

  • 128GB RAM
  • 12x2TB SAS disks

We have to had the wikis spread across two hosts and using on a per-instance configuration so we can tune the memory per section based as any small. They cannot replicate all the sections at the same time, there is not enough IOPS to do so.

I checked the hosts @elukey mentioned at T210749#4789691 and those have even SATA disks, so not even SAS, so even slower.

Got it. Luca asked me to comment here describing exactly what we need to do on the boxes. But if basic replication can't even run, I of course defer to you all. It does raise questions about this approach, though, to waste so many resources to replicate real-time all operations for hundreds of tables, when ultimately we need a snapshot of a small minority of tables.

Is there any other way to take a backup of just the tables we need, and restore that on these less powerful hosts, not in real-time?

Got it. Luca asked me to comment here describing exactly what we need to do on the boxes. But if basic replication can't even run, I of course defer to you all. It does raise questions about this approach, though, to waste so many resources to replicate real-time all operations for hundreds of tables, when ultimately we need a snapshot of a small minority of tables.

Is there any other way to take a backup of just the tables we need, and restore that on these less powerful hosts, not in real-time?

I guess that is to be solved on T210693 - come up with some "summary" tables that can be populated out of the queries you need for that subset of tables across all the wikis (I assume). Let's try not to mix things on tasks as otherwise information gets spread across many tasks.
Let's talk about tables, queries and materialized tables on T210693. I would encourage you to copy T210749#4795213 over to the other task, as it describes very well and clearly what the process is and what is needed.

So to sum up the hardware approach so far: the proposed hardware wouldn't be able to catch up with the current replication stream for all sections.

Ok, done and agreed. But instead of trying to find hardware that will keep up with replication, I'm asking if replication is necessary, could we do it any other way, given the relatively simple requirements?

So to sum up the hardware approach so far: the proposed hardware wouldn't be able to catch up with the current replication stream for all sections.

I am all for finding the right hardware here that minimizes DBAs efforts, seems that a whole single configuration for all DB hosts is much easier for @Marostegui's team to deal with, which makes total sense.
Let's finalize the conversation on our upcoming meeting this week, thus far is looking like we can use some of "our" extra budget to buy newer hardware and maybe some borrowing of $$$ is needed.

elukey mentioned this in Unknown Object (Task).Dec 4 2018, 6:03 PM

Opened a procurement task for 1 Cloudb replica in T211135. We are not planning to buy two hosts with the following assumption:

  1. We import data once a month for 2/3 days
  2. In the unlikely case of a major hw breakage, we could use for that month (temporarily) the other cloud db replica (basically the one that we are using now).

The idea is that we'll use this replica for the following months while we prepare for a more stable and long term solution, hopefully giving back the host that we are about to order to be repurposed. If this is not the case please let us know, so we'll probably buy two hosts rather than one.

The other reason for a single host instead of redundant hosts is this: our only critical use of the box is during the first few days of the month. The box would have to break during those first days to really impact us, if it breaks at any other time and it's rebuilt by the time we need it, we're fine. So that further mitigates our risk. Assuming we can query from the other replicas in the worst case scenario, I think we're fine with just one additional box.

Ok, done and agreed. But instead of trying to find hardware that will keep up with replication, I'm asking if replication is necessary, could we do it any other way, given the relatively simple requirements?

The labsdb hosts which (have all the databases in a single instance) are not backed up currently, so if replication is not important but we want to have fresh data on that host we need to solve dumping from one of the labsdbs and import it on the new host.

@Nuria do you think we should close this as it is decided we'll go for a host with the same specs and config than the rest of clouds replicas (T211135)?

@Nuria do you think we should close this as it is decided we'll go for a host with the same specs and config than the rest of clouds replicas (T211135)?

+1

@Nuria do you think we should close this as it is decided we'll go for a host with the same specs and config than the rest of clouds replicas (T211135)?

Whatever works best for you, I will let @elukey handle hardware ordering going forward.

Marostegui claimed this task.

Closing this as the hardware has been decided to be purchased and will be followed up at: {T211135}