Page MenuHomePhabricator

Investigate solutions for MySQL connection pooling
Closed, ResolvedPublic

Description

After T171071, it was clear connection pooling was needed for cross-dc queries. This is the research performed to setup a solution to enable such queries, assuming the traffic is not high.

Event Timeline

So this is the plan- use proxysql to perform connection pooling, only for cross-dc writes (read should always be local).
All cross-dc queries must be encrypted, however, proxysql at the moment doesn't have a stable version that supports TLS 1.2. The final method for encryption will be application-layer (proxysql), but until that is stable, we will use plain-text connections and tunnelize connections through a yet-to-be-decided technology.
Regarding architecture, we will setup a proxysql on each master, on a separate port, so that the local master is used for local writes and the proxysql is used for remote writes. While this is a SPOF, master were already a SPOF, so that will not be an extra moving part. Because the traffic between master is supposed to be low, non-critical for basic functionality, and hopefuly, degrades gracefully (e.g. failing, and sending a next request to the primary datacenter), I believe this architecture would be enough until a more serious approach to HA/failover/load balancing is setup, and until proxysql gets proper TLS support.

Work has been done already on puppetizing proxysql and creating a pakage for easy install, plus a small production testing. The next step is to test on a real master with a controlled environment (mwdebug2*) so that we can validate on a small set of hosts, but with the same conditions as a full deployment.

The main blocker right now is to decide on a tunneling technology, as most seem to have issues.

Vvjjkkii renamed this task from Investigate solutions for MySQL connection pooling to dobaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii removed jcrespo as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from dobaaaaaaa to Investigate solutions for MySQL connection pooling.Jul 1 2018, 10:21 PM
CommunityTechBot assigned this task to jcrespo.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.

The main blocker right now is to decide on a tunneling technology, as most seem to have issues.

Is this project part of any upcoming quarterly goals for people to work on?

No, this is not at the moment a goal, but it is ongoing work- recently there was a new 2-beta release, and I am testing if it requires no tunneling at all, which would be a better option.

We have scheduled the purchase of new proxy hardware next quarter, and installing it there would be better than on the masters.

So this is still moving- not as fast as we would want because backups and dc switchover goals.

You can help on your side (mediawiki) in parallel by preparing a way (configuration, code) to connect to remote master [a proxy will show as a plain-text mysql local database connection] when needed.

jcrespo moved this task from Triage to In progress on the DBA board.

You can help on your side (mediawiki) in parallel by preparing a way (configuration, code) to connect to remote master [a proxy will show as a plain-text mysql local database connection] when needed.

Could you elaborate on how this would work a bit more?

Could you elaborate on how this would work a bit more?

I will install a proxy on each master pointing to the master on the other datacenter. The fact that it is on the master can be considered as always true at the moment, or have its own configuration line in a combination of host and port. The port, obviously, will be different from the local mysql is listening. Maybe at some point in the future both local and remote masters will be reachable from more than one proxy server for redundancy, but given that is not true at the moment for local masters it is not a priority right now. Whatever logic is used to chose to connect to a local master or a remote one is up to you. I would try to avoid, however, having to change configuration in the event of a primary master switch. The details of how you want the configuration to work is flexible- do we want to setup that on etcd so we do not have to deploy every time a master is switched? Or something else?

Also in terms of prioritization, I asked if I should put this on top of other things and the answer was no due to us not being the blocker- sessions and other things being it.

We are on discussion to see when #DBAs can move this forward (pending testing proxysql2 package).

jijiki triaged this task as Medium priority.Dec 3 2018, 1:00 PM

@jcrespo Why would we need to deploy Mediawiki in order to repoint when the master is switched? Wouldn't the proxy be responsible for that?

@jcrespo Why would we need to deploy Mediawiki in order to repoint when the master is switched?

I think Jaime was talking about the current situation - where whenever we flip over a master, we have to change db-eqiad.php or db-codfw.php to modify the primary master IP.

tstarling claimed this task.
tstarling subscribed.

I talked with jcrespo, marostegui, cdanis, Joe and Krinkle about this on IRC, and the consensus was that we don't have a deployable tunneling or connection pooling solution right now, and that ~200ms of latency is tolerable for this use case, since the success of T92357 means that the connection rate will be low, less than 10/s. The rate of connections with user-visible latency may be on the order of 1/s.