Page MenuHomePhabricator

CAPEX for ParserCache for Parsoid
Open, MediumPublic

Description

If we expand the ParserCache for Parsoid, this is going to require consideration of cache capacity.

I can see a lot of possibilities, but I'd like to have others weigh in on what we could do here. My first takes are:

  • Add hardware. This seems kind of extreme, especially since we think this transition is going to be temporary.
  • Have lower priority or TTL or something for the non-default parser. I'm not sure how this works or if it would help.
  • Use existing storage. We currently have the default parser cache on memcache, and the RESTBase cache in Cassandra. The "parser cache" discussed in these user stories might just be multiplexed across these different storage layers. I don't have the chops to say whether that's preferred or not.

Event Timeline

This is all VERY preliminary numbers, just to have an idea where we stand

According to https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1 we have 3 Tib of disc available on parser cache nodes.

According to https://grafana.wikimedia.org/d/000000418/cassandra?viewPanel=11&orgId=1&var-datasource=codfw%20prometheus%2Fservices&var-cluster=restbase&var-keyspace=enwiki_T_page__summary&var-table=data&var-quantile=99p current content stored in RESTBase is 34 Tb.

  • This includes 3-way replication, thus it's actually ~11 Tib of content.
  • This includes ALL content, thus Parsoid-HTML + Mobile-HTML + Mobile-sections. All the content types are roughly the same size, I will do more precise calculation soon, but 11/3 ~= 4Tb of Parsoid-HTML + data-parsoid.

So, in a very very back-of-the-napkin calculations, we probably need more space for ParserCache if we are to have Parsoid content for all articles in there. I will get back to the ticket with a more precise estimates of the actual needs.

Some more RESTBase utilization data is available on T258414

In case we wanted to cannibalise some servers from the restbase cluster as we move their content to parsercache backends, assuming such a thing were feasible on the software side:

  • pc1007,8,9,10 and pc2007,8,9,10 all are PowerEdge R440s with 2 Xeon Gold 6216 (2.60GHz) processors, 264GB RAM, H730P raid controllers, 16 1.6TB SSDs.
  • restbase1028,29,30 are also PowerEdge R440s, with 2 Xeon Silver (2.2GHz) processors, 132GB RAM, no controllers, and 9 1.9TB SSDs.

I didn't check older restbase boxes, as their specs are not likely to be better.

I don't know how costly it would be to upgrade memory and add a controller and more disks to these servers.

Pchelolo added subscribers: Marostegui, jcrespo.

I guess we have to begin here.

TLDR of the problem is that we will not have enough space in MySQL for ParserCache for transitioning from old PHP parser to Parsoid. We would need to roughly double the storage capacity of the cache and very roughly triple throughput.

We have 3 options:

  1. Buy more hardware for ParserCache, keep using MySQL. The downside of this is procurement time and more importantly, once the transition to Parsoid is complete in several years we will end up with drastically over provisioned cluster.
  2. Try taking some machines out of RESTBase cassandra cluster and place them into ParserCache cluster. I'm not sure mysql can run on RESTBase machines properly, this will likely require datacenter operations to repack the servers, decomissioning some of the machines from Cassandra will take time and in the end we end up with heterogeneous cluster - not the easiest or best solution.
  3. Change ParserCache backend to Cassandra and reuse existing RESTBase cassandra cluster. This will require some software development, mainly packaging and testing PHP cassandra driver, but potentially is very attractive: Cassandra was very well battle-tested for similar use-case in RESTBase, it could bring us capability to write into the ParserCache from the secondary DC, which we don't currently need, but certainly could think of some usages for it. Also, this entirely avoids buying new hardware.

At this point, we need to collect more information, mainly from DBA, @Marostegui and @jcrespo :

  1. Are we currently having any issue with ParserCache MySQL cluster? Can it ~double it's capacity in storage volume and ~triple it's capacity in request volume if we buy more hardware, or there's some potential pitfalls in scaling it?
  2. If I were to estimate p95 READ/WRITE latency on MySQL level, how would I do that. Couldn't find the graph on grafana?
  3. Is it theoretically possible to run MySQL on repurposed RESTBase machines (see comment above for specs)?
  4. Is it interesting/beneficial to you in any way if we did the work to switch ParserCache backend from MySQL to Cassandra for Parsoid-HTML, and thus eventually dropped MySQL-backed ParserCache?

I have one question before everything else- does the parsercache expansion mean like a new "cluster/service" in parallel to the existing parsercache or would it be more like an expansion of the current service, to increase the number of hits/change the pc policy to store more data?

p95 and other data is not on prometheus on purpose because the exporter that has that info could reveal sensitive information on other mysqls server regarding user activity, but we have the data, privately.

Looking at pc1007, the average latency of writes since the last restart is 969 microseconds (<1ms) and p100 (max_latency) is 485.03 ms. For reads, the average latency is 2.32 miliseconds and the p100 (max_latency) is 100ms. Note writes are faster than reads because contrary to what Apergos stated before, parsercache runs on very cheap hardware, with no SSDs, only rotational disks, so writes happen synchronously on memory but reads are done from cold rotational disks, given it is a second later of cache (the first one being memcache on top of it). I would worry more about the memcache impact than parsercache dbs.

I will let @Marostegui answer the questions themselves, but may be the wrong questions, given it should question memcache rather than second, slow, rotational-disk, mysql-based layer?

Thank you for the answers!

I have one question before everything else- does the parsercache expansion mean like a new "cluster/service" in parallel to the existing parsercache or would it be more like an expansion of the current service, to increase the number of hits/change the pc policy to store more data?

We are reusing the same software for caching Parsoid as for caching default parser, but the use-cases are quite distinct, plus we will always know which parser is being cached, so we can do either - we can split the backends into separate clusters or we can expand the current service, depending on what's better/easier.

I will let @Marostegui answer the questions themselves, but may be the wrong questions, given it should question memcache rather than second, slow, rotational-disk, mysql-based layer?

Perhaps the questions are wrong indeed, but we are in a rather early data gathering stage. I thought memcached was owned by other branch of SRE, so I did not want to dump the questions about it on you.

I guess we have to begin here.

TLDR of the problem is that we will not have enough space in MySQL for ParserCache for transitioning from old PHP parser to Parsoid. We would need to roughly double the storage capacity of the cache and very roughly triple throughput.

We have 3 options:

  1. Buy more hardware for ParserCache, keep using MySQL. The downside of this is procurement time and more importantly, once the transition to Parsoid is complete in several years we will end up with drastically over provisioned cluster.

Just for the record: we haven't budgeted this for this FY (I wasn't aware of this project requiring DB hardware from our side).

  1. Try taking some machines out of RESTBase cassandra cluster and place them into ParserCache cluster. I'm not sure mysql can run on RESTBase machines properly, this will likely require datacenter operations to repack the servers, decomissioning some of the machines from Cassandra will take time and in the end we end up with heterogeneous cluster - not the easiest or best solution.

Not knowing what's the current RESTBase HW at the moment, this would require, as you mention, work, checking performance and unfortunately, having databases which aren't following any of our existing HW configurations.
They might even require purchases to adapt them to the needs (ie: I don't know if they currently have HW RAID controllers - from what Ariel mentions, they don't.) .
Also, I am not sure how useful would be to buy new hardware pieces (controllers, more memory, SSD disks...) for hardware that will be decommissioned at some point, rather than buying new hardware entirely for this process.
We, DBAs, avoid doing that.

  1. Change ParserCache backend to Cassandra and reuse existing RESTBase cassandra cluster. This will require some software development, mainly packaging and testing PHP cassandra driver, but potentially is very attractive: Cassandra was very well battle-tested for similar use-case in RESTBase, it could bring us capability to write into the ParserCache from the secondary DC, which we don't currently need, but certainly could think of some usages for it. Also, this entirely avoids buying new hardware.

This would be nice. I am not completely convinced parsercache should live in MySQL. There are many things there that we haven't solved, and that they are in fact, hard to solve.
Some are tracked at T133523
One of the things that keep me up at night is the fact that, currently:

  • we don't have a failover method that we could implement automatically given that each host has its own data, so if a host crashes, we need to place another one which is "empty".
  • the amount of writes we get generates an incredible amount of binlogs, and any minor change to the keys, can result on the disk sizes increases dramatically (we've had issues in the past)

At this point, we need to collect more information, mainly from DBA, @Marostegui and @jcrespo :

  1. Are we currently having any issue with ParserCache MySQL cluster? Can it ~double it's capacity in storage volume and ~triple it's capacity in request volume if we buy more hardware, or there's some potential pitfalls in scaling it?

As mentioned above T133523 captures most of the issues we currently have.
From my point of view, it is a system that works nicely, but it is very fragile and any minor change can have dramatic results in latency or on emergency work required, as we've seen in the path.
This is one of the most epic ones I remember: T167784
And it was not the only one of that nature: T206740 and lately we had T247788

  1. If I were to estimate p95 READ/WRITE latency on MySQL level, how would I do that. Couldn't find the graph on grafana?

See Jaime's comment above

  1. Is it theoretically possible to run MySQL on repurposed RESTBase machines (see comment above for specs)?

It is possible, but I would highly discourage that. See my previous comments.

  1. Is it interesting/beneficial to you in any way if we did the work to switch ParserCache backend from MySQL to Cassandra for Parsoid-HTML, and thus eventually dropped MySQL-backed ParserCache?

I think that can be a major win for the Data Persistence Team for the reasons I also stated above.

Thank you!

Small addendum: Note that parsercache functionality is memcached + MySQL, not just MySQL. In fact the MySQL part was a later addition for disk persistence/larger dataset.

LSobanski added a subscriber: LSobanski.

Another small correction:

it could bring us capability to write into the ParserCache from the secondary DC, which we don't currently need, but certainly could think of some usages for it

Parsercache mysqls already in a write-write configuration already between datacenters.

<snip>

Looking at pc1007,...

<snip>

Note writes are faster than reads because contrary to what Apergos stated before, parsercache runs on very cheap hardware, with no SSDs, only rotational disks, so writes happen synchronously on memory but reads are done from cold rotational disks

I'm going by the Dell quotes for the hw, backtracking from the racking task. If those are wrong, can $someone point me to the right ones?

Cassandra is not absent of its own issues, and it has a much higher cost per GB than parsercache currently has (I did no research, but I suspect it's in the order of 5x at the very least).

I'd start evaluating how much higher is that cost, and understand what's really the best way to deal with the shortcomings of parsercache.

For instance:

  • cassandra runs on 30 servers with large ssd disks across two datacenters
  • we're currently able to store about 1/6th of the total storage capacity (3x replication in a DC, plus cross-dc replication)
  • If we lose one server (in any datacenter), that can reduce our available storage space by 1/5th (IIRC). See https://phabricator.wikimedia.org/T256863#6319860

This is to say that while cassandra itself could be a good replacement backend for parsercache, our current installation with its current configuration probably isn't.

I would suggest we go through the work of thinking "how a parsercache infra would be if we started from the ground up" with both mysql and cassandra, and understand what's best from there.

If that's not possible, I strongly suggest we don't start using a storage system that was designed with extreme fault-tolerance to avoid data losses as a new backend for a cache (although it should be noted it's mostly storing a cache nowadays).

so, cassandra, in our current configuration, doesn't really seem like a sustainable backend for parsercache as-is.

@ArielGlenn the current parsercache hosts run SSDs.

I'm going by the Dell quotes for the hw, backtracking from the racking task. If those are wrong, can $someone point me to the right ones?

I went by cat /sys/block/sda/queue/rotational and memory, maybe I am wrong.

In any case, my point was that they were not db-grade hw, I think they use RAID5?

I'm going by the Dell quotes for the hw, backtracking from the racking task. If those are wrong, can $someone point me to the right ones?

I went by cat /sys/block/sda/queue/rotational and memory, maybe I am wrong.

They are SSDs, please check the quotes: T195876

root@pc1007:~# megacli -pdList -A0 | grep Media
Media Error Count: 0
Media Type: Solid State Device
Media Error Count: 0
Media Type: Solid State Device
Media Error Count: 0
Media Type: Solid State Device
Media Error Count: 0
Media Type: Solid State Device

This work will be handled by Parsoid team

ssastry removed WDoranWMF as the assignee of this task.
ssastry added a subscriber: WDoranWMF.

@WDoranWMF. I am going to reopen this and unassign you, but feel free to change project tags to tracking tags for your purposes.

Independent of which team is primarily responsible for this, both CPT and Parsing teams need to be involved and the discussions here are still relevant.

@ssastry That works for me, I didn't know if there was an existing ticket elsewhere but I was hoping that closing and tagging would at least assure that your team can take it over or retrieve any information from it. I'm going to untag us so that it's clear it's not in front of us right now.

As soon as we are needed just tag Platform Engineering and we'll make sure to be available. Thanks!

Okay, so I read through the ticket and from what I can tell, there are three different pieces to this in this order:

  1. Parser Cache backing "storage" design questions need to be answered before we get to the CAPEX part which is the simpler piece.
  1. As far as Parsoid rollout is concerned, we don't plan to roll out Parsoid everywhere this fiscal year. Our short term goal for end of this fiscal year (June-end) is to roll out Parsoid for read views for a simple use case (T265943 exists for that). But, given this discussion, maybe the parsing team should pick officewiki / mediawiki in T265943 since using ParserCache with existing hardware for those wikis will not make much of a difference given the size of those wikis relative to the size of the parser cache.
  1. The actual CAPEX funding requests for full rollout will be part of the next annual planning cycle, but we should plan for early procurement so if all the other work gets done, we can start rolling out Parsoid end of calendar year.

As far as I can tell, this ticket ought to be about #3. #2 is answered. @Pchelolo do you need a separate ticket to work through #1 or do you (=CPT) have your answers based on inputs from the many SRE folks on this ticket?

I believe the key questions to DBA have been answered so I'm removing the team tag. Please add us back if there is anything else we can help with.