Page MenuHomePhabricator

Consider using Cassandra/restbase in place of external store
Open, MediumPublic

Description

This possibly could involve another ExternalStoreMedium subclass.

Cassandra can be used for:
a) More reliability
b) Simpler maintenance
c) Better load spreading
d) Better compression (if the storage keys had page ID hints as prefixes)
e) Avoid lots of awful DYI code

Event Timeline

aaron created this task.May 28 2015, 9:15 PM
aaron raised the priority of this task from to Needs Triage.
aaron updated the task description. (Show Details)
aaron added a project: Availability.
aaron added subscribers: aaron, GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 28 2015, 9:15 PM

@aaron, we have this in our roadmap for the next fiscal year, and have budgeted for additional hardware to support it.

I'm leaning towards using the same title/revision/tid layout as we use for HTML, and translate page id/revision to title/revision as needed. This way we can process latest-revision-for-title and title-revision queries at the latency cost of a single query, without a need to translate to page id first. The disadvantage is an extra sequential request in the by-page-id case where no title is available, but this can be shared with other content types already stored in RESTBase. In the ExternalStore use case the title should normally be available (and can be stored as a key in any case), so either approach should work.

In general, I think we should handle any page/revision related content in RESTBase uniformly. Lets work out the details around linear history (T87393), and then decide which solution presents the best trade-offs overall.

GWicke set Security to None.Aug 12 2015, 11:31 PM
GWicke edited subscribers, added: faidon; removed: Aklapper.

I would request this ticket to be invalid. It seems that is searching a solution for a problem that doesn't exist. Also, not involving Operations or the Database project (the current maintainers of the service you want to rearchitecture) on such a huge task is very surprising to me.

I would request to please mark this ticket as invalid and, if you believe that it is necessary, open another one saying: "External Storage has X, Y, Z problems" and then we can discuss how to fix those without limiting us to a particular technology and service- maybe changing small things; maybe hearing about the new hardware to support this service; or maybe bring it completely down and starting from 0.

Sadly, two can play to this game:

"Consider using MySQL/Galera in place of Restbase"
MySQL/Galera can be used for:
a) More reliability
b) Simpler maintenance
c) Better load spreading
d) Better compression (if the storage keys had page ID hints as prefixes)
e) Avoid lots of awful DYI code

Yes, as vague claims as the original ones. That is why I wouldn't do it without fully understanding the current system, contacting the maintainers without taking any decision and involving them, and presenting ourself as a helping hand, not in a confrontational way, trying to push for one particular solution and only that. Even that, you have chosen the service of the ones currently served by MySQL that has the least amount of problems, maintenance and downtime. If you had told me that you wanted to take over the *link tables or Wikidata usage or an analytics/BI I would have said "hell, yeah, let's do it".

Last thing we want is to fall in the trap of reimplementing our code from scratch , and limiting ourselves to a single technology, when there is so much going on in the key-value world.

aaron added a comment.Aug 16 2015, 8:25 AM

I don't think anything would happen without some RfC, though Gabriel mentioned it being on a roadmap (I assume/hope that's just an evaluation, not a firm "we are doing this" commitment). We *could*, of course, use the current ExternalStoreDB system for some time. Also, Cassandra doesn't have to mean "restbase", though it's something to consider.

As for the brief points mentioned in the summary:
a) We've had data loss (fedora cluster 16 that still causes exceptions till this day for such ES URLs). It's mostly due to all the messy DIY PHP code/config we have. Also, reliability of systems that do manual master/slave sharding (via app config) requires lots of redundancy of slaves, since the system cannot automatically rebalance. It it is indeed somewhat theoretical in that I can't recall this biting us much in the last few years. Hardware can be thrown at this problem (e.g. minimum slaves per cluster, regardless of CPU/memory usage).
b) We have recompression/migration scripts that are very scary and often require a code audit before running to make sure extensions are compatible (that happen this around to with Flow afaik). Automatically rebalancing and using keys names that result in good compression would obviate the need for such scripts.
c) This could be argued both ways in theory. On the one hand, chronological sharding (as we have more or less) results in old content on older servers and newer content on newer servers. If traffic goes much more to newer content then the old content servers are underutilized. If it is even with respect to revision time, then this doesn't matter as much. Highly accessed articles have revisions with young ages, though there are a long tail of pages with edits farther in the past (though bots making trivial changes tend to bump the latest revision timestamp, e.g. use "random article"). Obviously, histograms by revision and by latest page revision would be handy in any RfC. The hottest keys (main page templates tend to show in Kibana mc errors) should be handle by memcached, so those are a non-factor here (this is a point for MariaDB).
d) Our compression works by simple per-blob gzip, which is less efficient than putting related (new versions of page) blobs adjacent to each other via key name. We can manually run scripts to concatenate such blobs and zip + PHP serialize them and update the references (e.g. rc1/rc2). Of course, we can just use simple-blob gzip and through hardware at the problem, ala "disk is cheap".
e) All of the scripts mentioned above are quite scary, DIY, and only one person really understands them (Tim). The code run when accessing blobs for regular usage is fairly complex with a lot of legacy handling (we still have cur SQL tables for example). See https://wikitech.wikimedia.org/wiki/Text_storage_data for all of the different blob formats that our PHP code has to handle. Of course we can just clean this up with yet another complex migration script, assuming Tim is willing and not to afraid to run it, heh. That would cut down a lot on the tech debt and DIY in the current system.

As the title states, this is a request to consider Cassandra (directly or via restbase). It may well be that a huge cleanup of the current system is enough to keep it maintainable in the long term. I don't worry about "reimplementing our code", as what ES does is not really that complex in today's world, it's that we do it ourselves and have lots of legacy baggage that make it complex (e.g. the HistoryBlob classed and scripts). This is also a Cassandra "shaped" problem and if a such a tool can work (and it seems to work fine for parsoid), why not use it? Is the standard on Cassandra usage higher than for other nosql stores? I'm actually reminded of a good article, http://mcfunley.com/choose-boring-technology . I don't think sticking to MariaDB and Cassandra for lots of things instead of other (sometimes less mature) DB is "limiting" ourselves much, certainly not more than sticking to just MariaDB ;)

I'd like to think this isn't a "game", and certainly not a "resbase is awesome, mariadb sucks" flameware. Actually, it took me a long time (and much of the old mwcore team) to come around to understand/accepting restbase in practice (and afaik there are still unresolved security model questions). If general, I'd lean towards being a MariaDB advocate and tend to dislike "mysql sucks" types of threads; it took a fair amount of resolve not to sarcastically respond to http://www.gossamer-threads.com/lists/wiki/wikitech/591972#591972 . It's just that the way we use it for ES (implementing manual sharding when key/value stores already do this, cassandra having mutli-DC awareness backed in) is just questionable to me.

@aaron Question: Are you interested on cleaning up and improving ES or on pushing Cassandra? I need a short answer.

aaron added a comment.Aug 16 2015, 8:48 AM

@aaron Question: Are you interested on cleaning up and improving ES or on pushing Cassandra? I need a short answer.

I'd like for us to *consider* Cassandra. Failing that, then I wouldn't mind working on some cleanup. Doing both doesn't make much sense as a migration script to cassandra would obsolete much of the current code in passing (e.g. all the blob formats). So they are kind of tied together from a planning perspective.

jcrespo removed a subscriber: jcrespo.Aug 16 2015, 8:51 AM
aaron added a comment.Aug 16 2015, 9:31 AM

As far as cleanup-only, not that it will not solve:
i) The manual sharding code
ii) The manual tracking, compression, tracking update code, and related extension dependencies

...we can at least migrate the blobs to unified set of formats (one per blob gzip and one concatenated gzip blob formats being the only types).

I assume/hope that's just an evaluation, not a firm "we are doing this" commitment

Yes, absolutely. A serious proposal should be backed by hard data and a good amount of operational experience.

Joe added a subscriber: Joe.Aug 17 2015, 8:02 AM

What baffles me here is:

  • We want to replace cheap, proven technology (mysql) with something we're still figuring out how to use and is much more expensive (at least AFAICS)
  • Why on earth would we want to use cassandra for this, given it is a write-optimized distributed key-value store, and we have about 20 inserts/s here. We want something that has an high read throughput in case, like couchbase et al

In general replacing a sound technology with 15 years of battle testing for final storage seems not that smart of a move to me unless we've reached some kind of scalability bottleneck. Until we do - or we foresee we might - I don't see the point. And even then:

  • I strongly suspect cassandra is not the optimal alternative (did you explore the NoSQL landscape?)
  • I think that if the problem is in the code, we might just rewrite the code and keep using mysql
  • We surely SHOULD NOT use restbase as a medium between mediawiki and cassandra - we could just use one of the sharding solutions for mysql instead at that point. I must also add that this kind of use of a microservice (as an endpoint to access a database, basically) is exactly the kind of antipattern I warned everyone about at the mediawiki developers summit
Restricted Application added a subscriber: Matanya. · View Herald TranscriptAug 17 2015, 8:08 AM
Joe added a comment.Aug 17 2015, 8:08 AM

Added operations as a tag, as not involving ops in such a debate seemed a bit peculiar, tbh.

aaron removed a subscriber: aaron.Aug 17 2015, 2:30 PM
Isarra added a subscriber: Isarra.Aug 17 2015, 6:19 PM
MoritzMuehlenhoff triaged this task as Medium priority.Aug 21 2015, 8:27 AM

I don't really want to revive this ticket but I do want to know if it's seriously on the roadmap or indefinitely deferred/rejected.