Page MenuHomePhabricator

Decide between proxysql and haproxy for labsdbproxy service
Closed, ResolvedPublic

Description

I will add here my impressions about both services, with pros and cons, so we can take an informed decision. In particular, I would like to know labs group opinion, as it could add maintenance overhead in some cases:

  • It is a pure mysql proxy, which means it understands natively the protocol, allowing to have integrated support for things such as lag monitoring, connection failure, master-slave failover, etc. BUT
  • We do not need many of those for labs- we need 2 things- if things completely break, move to another server; and handle maintenance windows in the most gracefully way possible
  • I thought proxysql would have a way to customize check- however, the models is to create checks and those pool or depool servers by changing the configuration (aside from the basic health checks). BUT
  • This is something haproxy can already do, by writing our own http wrapper based on bash scripts that do arbitrary checks
  • proxysql has nice features you cannot find on a regular proxy- it has mysql connection pooling/persistent connections, so it can avoid the overhead of creating new connections BUT
  • the proxy will live outside of the tools by definition, so there will still be overhead on connecting to the proxy itself, and for labs the connection overhead is not that large, not that an interesting feature
  • proxysql allows to have global status per user instead of having to aggregate them. For example, we can limit the number of total connections, instead of doing it per server
  • proxysql, at this time, requires authentication to be setup in the proxy itself. According to documentation, it is planned to have different frontend and backend users, which would be great to maintain users in a single place, but as of now, they have to be duplicated- which means double the places where things can fail and maintain.
  • proxysql allows query rewriting or rerouting, that is not possible for haproxy. For example, certain users or queries can be sent to a specific set of servers. While in theory we do not need this for now, maybe we could redirect certain database usage or writes to a single master automatically.
  • Chase mentioned the possibility of incompatibilties as proxysql reimplements the full protocol rather than being a L4 proxy. This could in theory introduce incompatibilities. I believe those should be minimal, but I cannot discard it 100%.
  • Seeing the latest releases, https://github.com/sysown/proxysql/releases proxysql seems to still have some important bugs fixed. Which is good, because it means it is maintained, but I am not sure how well it would fit with the difficult labs environement. L4 routing, by definition, will be easier to handle by simpler pieces of software.

More coming up.

Event Timeline

jcrespo created this task.Nov 2 2016, 8:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2016, 8:34 PM

I will also add my thoughts about this.

As I expressed during the meeting, if possible I would try to go for proxySQL.
Jaime has done a great job detailing the pros and cons.

The biggest pain point I see is the need to maintain the users on the proxy itself which would mean that we would struggle to use what we wanted to use from MariaDB 10.1: roles.
On the other hand, I see that as a temporary pain point if they finally go for a different solution (which we cannot know if it will happen in weeks, months or in a year). So we'd need to live with the assumption that we will need to maintain users in two places for a long time.

My main interest in going for proxySQL is essentially the fact that it is getting bigger and bigger on the MySQL ecosystem and we'd be able to see it growing and using labs as a "canary" to see if it will be fine for production, were it can be a big plus.

For now, proxySQL will be, pretty much doing the same thing as we are doing with HAproxy, but with the advantage of having it already set up and installed so we can benefit from the future versions and features.

Truth be told, this would mean that we'd need to to maintain two different proxies in production, HAproxy and proxySQL, and that is not ideal.
Some of the Ops folks already expressed their concerns about having different proxy solutions for production vs mysql (Joe, for instance).

jcrespo moved this task from Triage to Next on the DBA board.Nov 3 2016, 11:36 AM
jcrespo updated the task description. (Show Details)Nov 7 2016, 4:44 PM

Where I'm at after thinking on it for a bit:

  • We have the ecosystem to effectively standup HAProxy here (monitoring, operating it, etc)
  • We don't have a solid use case that requires proxysql at the moment, but we are looking forward to them
  • It will be difficult if there are issues that effect proxysql that surface in labsdb usage that require large time or skill investments for debugging
  • We have to implement a two layer account management system with proxysql and we have not solidified a single layer account management system (do we use roles, do we continue to manage via scripts on labstore, do we move to striker, how do users engage the system, how can users reset passwords, how are passwords exposed to users, etc).

I'm leaning towards HAProxy and then we can use that to cut over to the cluster paradigm and shift the workload in Tools. We can also standup proxysql alongside it and test out our use cases and migrate a curated list of users to when we are ready. I don't feel like I'm voting against proxysql, but I do feel like I'm voting against it as being required in parallel with this other work considering our resources. It's largely a DBA domain question and I respect what you guys think will work out best, but the more I reflect the more it feels aggressive to try to bundle all of this into one wave of change.

bd808 added a subscriber: bd808.Nov 11 2016, 9:53 PM

Having to keep two authentication systems in sync seems like something that will break and consume a lot of operational debugging time. This may seem like a dumb thing to whine about, but we have ~1500 registered Tool Labs users alone that already cause issues syncing between LDAP and MySQL/MariaDB.

If this is a proof of concept for a serious desire to move from HAProxy to proxysql in production then it may be worth the pain. If however its yet another thing that will be different between Labs and production for the foreseeable future that also seems like a big red flag.

Chase mentioned the possibility of incompatibilties as proxysql reimplements the full protocol rather than being a L4 proxy. This could in theory introduce incompatibilities. I believe those should be minimal, but I cannot discard it 100%.

I can pretty much guarantee that if there is any weird, old MySQL syntax or behavior edge case that is not supported our Tool Labs users will stumble over it.

Note @Marostegui commented in favor of proxysql before I edited the tasks based on my observations of recent bugs, I would like to gather a second opinion from him before taking a final decision (which will not be final at all).

As we spoke on hangouts a week ago and after seeing the bug list you posted, I am fine with not going with it to production. Might be not be too mature for what we are looking for.

But as I expressed, I would like to still deploy it somewhere in our infra (as we discussed, maybe one of the misc shards?) so we can keep track of its development and not get too far behind it, as it is still looks very promising for a near future.

jcrespo closed this task as Resolved.Nov 14 2016, 6:20 PM
jcrespo claimed this task.

But as I expressed, I would like to still deploy it somewhere in our infra (as we discussed, maybe one of the misc shards?)

Oh, I cannot agree more.

Let's start with HAProxy for labs, maybe rethink it in the future; on a separate ticket, evaluate it for misc.