Page MenuHomePhabricator

App servers <=> mariadb SSL/TLS for cross-datacenter writes
Closed, ResolvedPublic

Description

Unless we can somehow entirely avoid writes on GET/HEAD (which might be hard for CentralAuth and a few other things), these will still happen on occasion, mostly as post-send writes in DeferredUpdates.

Such updates should use encryption, instead of just sending passwords and data over the wire.

Scope: All app servers, including job runners and maintenance servers.

Plan:

  • Use the ultimate primary DB server as the top entry in $wgLBFactoryConf['sectionLoads'] on the secondary DC. Similarly for writable clusters in externalLoads.
  • Add those servers to $wgLBFactoryConf['hostsByName']
  • Add the DBO_SSL flag to those servers, probably using $wgLBFactoryConf['templateOverridesByServer']

Related Objects

Event Timeline

I've put T111654 as a blocker, but in reality, cross datacenter writes using SSL should be already possible in all cases, as I can guarantee already that *all masters are already using TLS 1.2*. Only some slaves within the datacenter are still using plain text and only need to be rebooted.

We need to coordinate the certificate handling, the current system may not be the most adequate method.

I've been playing around with some open source SQL proxies lately.
These support persistent connections. I wonder how much of the
overhead of MySQL for a remote server is establishing connection and
TLS negotiation, and how much is real extra latency (e.g
cross-datacenter replication seems to not be that great, but it is a
single connection with low bandwidth requirements).

A test could be done to evaluate the overhead in both cases to help
plan the architecture.

re: certificate handling that @jcrespo mentioned, see also T150822: Internal PKI for secure communication - Barcelona Ops offsite 2016 for the related discussion we had at the Ops offsite 2016

TLS is deployed on all core MySQLs (s*, x2, es*, pc* shards)- although for obvious reasons, it is not enforced, I think this was the largest blocker for this task and is no more.

Reading things like https://www.percona.com/blog/2013/10/10/mysql-ssl-performance-overhead/ I think this may only make sense for DB_MASTER (also index 0) connections.

I suppose I can do some testing in script/eval.php to gauge serial connection and query rate differences.

Roll-out should probably be something like:
a) DB_MASTER connections for testwiki/mediawikiwiki (group 0)
b) DB_MASTER connections for S3 (25%, 50%, then 100% of connections)
c) DB_MASTER connections for external stores (25%, 50%, then 100% of connections)
d) All DB_MASTER connections remaining shards (25%, 50%, then 100% of connections)

That would take care of any cross-DC write scenario (without having to check which DC is active) and fulfill this task.

If it's acceptable, this can be expanded to all connections (e.g. DB_REPLICA), preferably if proxying is setup IMO. That would be a separate task though.

Krinkle renamed this task from Apache <=> mariadb SSL/TLS for cross-datacenter writes to App servers <=> mariadb SSL/TLS for cross-datacenter writes.May 25 2022, 12:16 AM
Krinkle updated the task description. (Show Details)

TLS connections from MediaWiki app servers to MariaDB appear to work just fine. You just pass flags=DBO_SSL and it connects with TLS 1.3, no certificate path configuration is needed. I timed it at 180-200ms cross-DC.

Task description edit: added plan for direct TLS, no connection pooling or tunnel.

tstarling changed the task status from Stalled to Open.May 26 2022, 4:32 AM

Change 799436 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Deprecate DBO_SSL

https://gerrit.wikimedia.org/r/799436

Change 799437 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[operations/mediawiki-config@master] Enable SSL for master DB connections in the secondary datacenter

https://gerrit.wikimedia.org/r/799437

Change 799685 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[operations/mediawiki-config@master] Add the master from the primary DC to the secondary DC load arrays

https://gerrit.wikimedia.org/r/799685

Change 799436 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Deprecate DBO_SSL

https://gerrit.wikimedia.org/r/799436

Change 799437 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable SSL for master DB connections in the secondary datacenter

https://gerrit.wikimedia.org/r/799437

Change 799685 merged by jenkins-bot:

[operations/mediawiki-config@master] Add the master from the primary DC to the secondary DC load arrays

https://gerrit.wikimedia.org/r/799685

Mentioned in SAL (#wikimedia-operations) [2022-06-13T23:30:54Z] <tstarling@deploy1002> Synchronized wmf-config/CommonSettings.php: T134809 g 799685 codfw master DBs (duration: 03m 30s)

Mentioned in SAL (#wikimedia-operations) [2022-06-13T23:35:11Z] <tstarling@deploy1002> Synchronized wmf-config/etcd.php: T134809 g 799685 codfw master DBs (duration: 03m 36s)

Mentioned in SAL (#wikimedia-operations) [2022-06-13T23:45:26Z] <tstarling@deploy1002> Synchronized wmf-config/CommonSettings.php: T134809 g 801836 remove variable wmgDbconfigFromEtcd (duration: 03m 26s)