Page MenuHomePhabricator

Experiment with and design options for multi-instance or multi-section wikireplicas frontend architecture
Open, HighPublic

Description

In the interest of trying to keep or make the wikireplicas a performant and sustainable service, new backend architectures are needed. This task is to get to work designing and working on frontend and WMCS-supported scripting architectures to allow more backend flexibility with minimal loss of ease-of-use for end users of cloud services.

Since this task encompasses general access and orchestration, special access needs such as PAWS and Quarry, will need their own tasks, and there is probably room for a lot of other subtasks as well.

Related Objects

Event Timeline

Bstorm triaged this task as High priority.Aug 13 2020, 7:55 PM
Bstorm created this task.
Bstorm moved this task from Backlog to Wiki replicas on the Data-Services board.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

I started investigating ways of routing haproxy based on hostname. Since most of the really juicy acl matches in haproxy are around http features, and mysql is tcp mode, it may not be possible. To do host-based routing in tcp mode, you need to match actual portions of the payload. That doesn't seem possible for all communications, though I may dig a little deeper into the auth negotiation to see if there is some way.

If we are trying to do name-based routing of databases, it seems likely that it would require multiple IPs assigned to the haproxy frontends in order to correctly route requests without requiring clients to map ports.

bd808 added a subscriber: bd808.Aug 13 2020, 8:30 PM

If we are trying to do name-based routing of databases, it seems likely that it would require multiple IPs assigned to the haproxy frontends in order to correctly route requests without requiring clients to map ports.

This seems possible to me. Today there would be 16 service names to route (8 slices x 2 access patterns) if done this way. We already actually create distinct DNS records for s[1-8].{analytics,web}.db.svc.eqiad.wmflabs. Today all the "analytics" records point to one load balancer IP and the "web" records point to another. These records are managed using the wmcs-wikireplica-dns script. These are all internal IPs too (10.x.x.x), so there shouldn't be any major concerns about IP exhaustion.

$ host s1.analytics.db.svc.eqiad.wmflabs
s1.analytics.db.svc.eqiad.wmflabs has address 10.64.37.27
$ host 10.64.37.27
27.37.64.10.in-addr.arpa domain name pointer dbproxy1018.eqiad.wmnet.

$ host s1.web.db.svc.eqiad.wmflabs
s1.web.db.svc.eqiad.wmflabs has address 10.64.37.28
$ host 10.64.37.28
28.37.64.10.in-addr.arpa domain name pointer dbproxy1019.eqiad.wmnet.
bd808 added a comment.Aug 13 2020, 8:37 PM

I started investigating ways of routing haproxy based on hostname. Since most of the really juicy acl matches in haproxy are around http features, and mysql is tcp mode, it may not be possible. To do host-based routing in tcp mode, you need to match actual portions of the payload. That doesn't seem possible for all communications, though I may dig a little deeper into the auth negotiation to see if there is some way.

It seems like we would need a real layer 7 proxy for the MySQL protocol to do "name based vhosts". In theory ProxySQL could do that (https://proxysql.com/documentation/how-to-setup-proxysql-sharding/), but I'm not sure if anyone would be excited about introducing ProxySQL for this feature. I have never actually used it myself, but I keep ending up looking at docs for it because of the mysql-proxy questions we have for PAWS.

Bstorm added a comment.EditedAug 13 2020, 9:12 PM

It seems like we would need a real layer 7 proxy for the MySQL protocol to do "name based vhosts". In theory ProxySQL could do that (https://proxysql.com/documentation/how-to-setup-proxysql-sharding/), but I'm not sure if anyone would be excited about introducing ProxySQL for this feature. I have never actually used it myself, but I keep ending up looking at docs for it because of the mysql-proxy questions we have for PAWS.

Yeah, it's possible to act on some pretty funky bits of matching in a tcp payload, but that seems somewhat daft with regard to mariadb. Some of proxysql might be just the thing for PAWS, on the other hand, to handle both this complication and T253134: Find an alternative solution for the mysql-proxy in PAWS. Even so, that'd be a different scale and design than this kind of thing, which also makes me favor the IP-based routing notion above using listen blocks.

bd808 added a comment.Aug 13 2020, 9:25 PM

This seems possible to me. Today there would be 16 service names to route (8 slices x 2 access patterns) if done this way. We already actually create distinct DNS records for s[1-8].{analytics,web}.db.svc.eqiad.wmflabs. Today all the "analytics" records point to one load balancer IP and the "web" records point to another. These records are managed using the wmcs-wikireplica-dns script. These are all internal IPs too (10.x.x.x), so there shouldn't be any major concerns about IP exhaustion.

I just realized this might also give us a simple way to run both systems in parallel for a testing period. We could publish DNS records as s[1-8].{analytics,web}.db.svc.eqiad1.wikimedia.cloud pointing to the 16 new service ips as soon as the backing cluster existed. The current eqiad.wmflabs records could then stay pointed at the legacy service ips and continue to route to the legacy cluster. Then on the date of the hard switch to the new cluster we could update the eqiad.wmflabs DNS records to use the new service ips as well.

On that cutoff date, I think we should also consider retiring the legacy *.labsdb DNS records that we have been keeping around since the current (soon to be legacy) wiki replicas cluster was deployed.

I just realized this might also give us a simple way to run both systems in parallel for a testing period.

Oooh! I like that. That might be a cool idea. Adding @Marostegui for comment on that kind of thing as well.

Yeah, we should have both systems running in parallel for sometime.
My idea was to have that and slowly turn off the old ones, as there are many pieces here, I am sure we'll keep finding stuff only once we've shutdown the old system, so it would be good to have it handy just in case we need to start it on again until we are fully confident that the new one is working fine.

Regarding the proxying, it might require indeed quite lots of work. I think we can also experiment with just RR DNS, first (although of course that doesn't give us persistent connections) it can be a good indication of whether the new and old systems are fully transparent, as in: the user doesn't notice to which set of hosts they are connecting to. Once that happens, that is a good indication that the old system can started to be powered off.

Change 621067 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: create cumin aliases for wikireplica servers

https://gerrit.wikimedia.org/r/621067

Change 621088 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/cookbooks@master] wikireplicas: add wikireplica cookbook to add a wiki

https://gerrit.wikimedia.org/r/621088

Change 621067 merged by Bstorm:
[operations/puppet@production] wikireplicas: create cumin aliases for wikireplica servers

https://gerrit.wikimedia.org/r/621067

Change 621343 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] cumin: for new wmcs. prefix for cookbooks, grant access to wmcs-admins

https://gerrit.wikimedia.org/r/621343

Change 621088 merged by jenkins-bot:
[operations/cookbooks@master] wikireplicas: add wikireplica cookbook to add a wiki

https://gerrit.wikimedia.org/r/621088

Change 621574 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/cookbooks@master] wikireplicas: fix typo in the dns script for wikireplicas

https://gerrit.wikimedia.org/r/621574

Change 621574 merged by jenkins-bot:
[operations/cookbooks@master] wikireplicas: fix typo in the dns script for wikireplicas

https://gerrit.wikimedia.org/r/621574

Bstorm added a comment.Sep 2 2020, 5:43 PM

I spent a bit of time testing out the idea of switching PAWS to using proxysql instead of mysql-proxy. I can see a big value in putting proxysql in front of the wikireplicas overall, but it honestly doesn't do the exact things we need for paws except query routing. I'm going to need to hack on mysql-proxy, I think, to add the appropriate routing.

Bstorm added a comment.Sep 2 2020, 7:31 PM

Ok, this doesn't look that hard except that it may require maintaining a git checkout of the mediawiki config that is up to date on the mysql-proxy pod. The main thing that needs to change is that the database needs to be matched to the server in DNS, which should be a CNAME for the shard DNS, which I hope will go to an IP that has a listen block that routes to the right hosts and ports on haproxy.

Just have to figure out where to squeeze that in there...

bd808 added a comment.Sep 2 2020, 7:48 PM

Ok, this doesn't look that hard except that it may require maintaining a git checkout of the mediawiki config that is up to date on the mysql-proxy pod.

Depending on what you need from the mw-config clone, you may be able to read files from https://noc.wikimedia.org/ instead. The wmcs-wikireplica-dns script loads dblists from there instead of a local clone of the config. One advantage of this is that noc is updated by scap, so there should never be configuration drift where the upstream HEAD has changed but the local clone has not pulled yet. (Ok, "never" is absolutist. How about "extremely rare" instead?) My versions tool uses this technique as well. You may of course need to introduce some caching of the config which would bring drift potential back, but at least you would not need to manage a git clone with a sidecar container.

The main thing that needs to change is that the database needs to be matched to the server in DNS, which should be a CNAME for the shard DNS, which I hope will go to an IP that has a listen block that routes to the right hosts and ports on haproxy.

Just have to figure out where to squeeze that in there...

I think @elukey and his team did something based on DNS for the dbstore1XXX hosts, which are running multi-instance as well. I am not fully sure of the details on how that was finally implemented, but maybe it can give you some ideas

bd808 added a comment.Sep 3 2020, 7:33 PM

The main thing that needs to change is that the database needs to be matched to the server in DNS, which should be a CNAME for the shard DNS, which I hope will go to an IP that has a listen block that routes to the right hosts and ports on haproxy.

Just have to figure out where to squeeze that in there...

I think @elukey and his team did something based on DNS for the dbstore1XXX hosts, which are running multi-instance as well. I am not fully sure of the details on how that was finally implemented, but maybe it can give you some ideas

The system used in the Analytics cluster is documented at https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB#Database_setup. It works pretty well in my experience but it would be a huge burden for Toolforge users to have to add port number lookups to all code connecting to the Wiki Replicas. Even if we make nice helper functions available in multiple programming languages I would expect a very large number of tools to break because they are not actively maintained.

The service name ideas above are an attempt to find a setup that moves the port mapping burden onto us in the infrastructure layer rather than pushing that work down to all of the users.

Bstorm added a comment.Sep 9 2020, 8:10 PM

So looking into Quarry a bit, I found that the setup is substantially the same as it is for PAWS. Connections to the replicas are simply done with a variable:

REPLICA_HOST: 'enwiki.analytics.db.svc.eqiad.wmflabs'

The nice thing here is that there's no deprecated proxy that needs some odd lua inserted in the client connection for ALL clients. It's just python. That said, the UI assumes you are already connected and then behaves sort of like a mysql shell (only not).

It'll need a "database to connect to" UI element that keys the app to connect to the right replica instance. The beautiful thing here is that we may not need to worry about nonsense like I did here https://phabricator.wikimedia.org/rPAWS26e5ac6bfd1df925314822cfdf0ce9362449321c because we can just prepend the datbase name to the URL instead.

Bstorm added a comment.Sep 9 2020, 8:20 PM

I can confirm that the sql command in the misctools package should work if we are proxying based on section names.

Change 627379 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: Proposal for a proxy setup on multi-instance replicas

https://gerrit.wikimedia.org/r/627379

Change 621343 merged by Jbond:
[operations/puppet@production] cumin: for new wmcs. prefix for cookbooks, grant access to wmcs-admins

https://gerrit.wikimedia.org/r/621343

Now it actually works and might even be safe enough to do a PR with. It should work now since the DNS aliases already exist.
https://github.com/toolforge/paws/commit/b3a348a337af0f507bbcb3f87882671faf3c75d4

Doing the paws work on this ticket feels like it is burying things a bit. I'm going to open a separate ticket for the notes on my quarry work.