Page MenuHomePhabricator

Avoid x2-mainstash replica connections (ChronologyProtector)
Closed, ResolvedPublic

Description

Following T212129 and especially T113916: Switch ResourceLoader file dependency tracking to MultiDC-friendly backend, I expected a notable read and write increase on x2.

This indeed happened as expected. However, what I did not expect is anything other than internal system connections and reads to the standby x2 replicas.

Instead, there as many connections to x2 replicas as the primary.

x2 primary: 200 open conns
Screenshot 2022-07-11 at 13.51.57.png (1×2 px, 326 KB)
x2 replica: 200 open conns
Screenshot 2022-07-11 at 13.52.05.png (1×2 px, 338 KB)

Event Timeline

These could indeed come from MW, since dbctl does provide replica information to MW as per https://noc.wikimedia.org/dbconfig/eqiad.json. Now, MW is not supposed to use this information, but to simplify dbctl it should be fine in principal to have the data passed in.

Runnig through SqlBagOStuff one more time, I've confirmed that as-written, it does not connect to DB_REPLICA. Both its reads and writes use only the single primary db. This is by design as there is no concept of a "lagged read" for the DC-local read-write use case. The standby's are purely for failover.

I used mwmaint1002 and ran sql --wiki testwiki --cluster extension2 without --write to get connected to a replica, and then run SHOW PROCESSLIST a few times to see what pops up. The first couple of tries, all I got was a bunch fo sleeper connections.

wikiuser20220610.64.48.223:35434mainstashSleep0NULL0.000
wikiuser20220610.64.48.221:51658mainstashSleep0NULL0.000
wikiuser20220610.64.16.41:40100mainstashSleep0NULL0.000
wikiuser20220610.64.32.60:50048mainstashSleep0NULL0.000

The source IPs correspond to MW app servers. I was starting to think these are just accidental connections with no queries on them, but eventually I did interact with some select queries:

wikiuser20220610.64.48.202:55952mainstashQuery0Writing to netSELECT /* Wikimedia\Rdbms\DatabaseMysqlBase::primaryPosWait */ MASTER_GTID_WAIT(...0.000
wikiuser20220610.64.48.215:50528mainstashQuery0Filling schema tableSHOW /* Wikimedia\Rdbms\DatabaseMysqlBase::getReplicaPos */ GLOBAL VARIABLES LIKE 'gtid_%'0.000

This indicates ChronologyProtector is at play here. My unfounded assumption is that ChronologyProtector only saves positions for DBs that we wrote to (e.g. it doesn't try to memorise and later wait for positions that we found on replicas) and it seems at least in theory possible that we also only ask DBs to wait for positions that relate to that same cluster, e.g. we should not be sending all dbs we connect to all positions from unrelated clusters for most to no-op (if that is even valid to attempt). If both of those laxities exist and both optimisations implemented, then the 200 concurrent connections would represent only subsequent web requests from clients for which we previously perfomed a mainstash write during a varnish miss earlier in that same browsing session.

Krinkle triaged this task as Low priority.

@Krinkle @tstarling can we raise the priority for this? So we can have x2 as it is supposed to be (that is, the replicas not being used at all for reads)

I'd like to try keeping complexity down by not adding a new concept to MW for "a cluster of servers where we pretend only 1 exists and disable the multi-server features".

The overhead of ChronologyProtector, the concept of lagged-replica mode, the periodic broadcast connect to all replicas to check lags and proactively enact read-only mode when lag is too hgih etc; these are all naturally turned off for single-server DB clusters.

I suggest we configure x2 as a single-server cluster, similar to what we do for local development and CI by default ("LBSimple"), and somewhat akin to what we've done with ParserCache for a long time (somewhat different since those do contain multiple servers, but we treat each as its own isolated master-only host).

Krinkle raised the priority of this task from Low to High.Aug 29 2022, 6:16 PM

I agree that it is high priority.

Change 827858 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: disable lagged-replica-mode if "max lag" is INF for all replicas

https://gerrit.wikimedia.org/r/827858

Change 828072 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[operations/mediawiki-config@master] Set "max lag" for all x2 servers to INF

https://gerrit.wikimedia.org/r/828072

After T316482, we see that the connection count indeed drops as expected on the x2 replicas.

Grafana dashboard: MySQL Aggregate

Screenshot 2022-09-02 at 14.51.32.png (722×2 px, 93 KB)

Screenshot 2022-09-02 at 14.51.39.png (550×2 px, 67 KB)

Screenshot 2022-09-02 at 14.51.36.png (1×2 px, 238 KB)

However, open connections is not zero. To confirm there aren't any cases MW-related edge cases we forgot about, I also samples the actual list of open connections a few times.

I note however that the "usual" way of connecting to a replica is, of course, now no longer possible.

$ sql --wiki testwiki --cluster extension2 --wikidb mainstash

This now connects to the master instead of a replica given there is no replica-only candidate known to MW.

And the --host option doesn't help either as it has a built-in check to validate that the host is among the listed hosts for that cluster, and so is considered invalid.

krinkle@mwmaint1002$ sql --wiki testwiki --cluster extension2 --wikidb mainstash --list-hosts
db1151

As a workaround, I invoked mysql directly with a recently-created tmp file, based on x2 replica hostname listed in orchestrator.wikimedia.org, and the IP listed at https://noc.wikimedia.org/dbconfig/eqiad.json.

$ mysql --defaults-extra-file=/tmp/mw-mysql83608b13a69f.ini --user=wikiadmin --database=mainstash --host=10.64.48.48

wikiadmin@10.64.48.48(mainstash)> SELECT @@hostname;
+------------+
| @@hostname |
+------------+
| db1153     |
+------------+

wikiadmin@10.64.48.48(mainstash)> SHOW PROCESSLIST;
+------------+-----------------+----------------------+-----------+-----------+--------------------------+
| Id         | User            | Host                 | db        | Command   | State                    |
+------------+-----------------+----------------------+-----------+-----------+--------------------------+
|          1 | system user     |                      | NULL      | Daemon    |       | NULL             |
|          2 | system user     |                      | NULL      | Daemon    |       | NULL             |
|  175322402 | event_scheduler | localhost            | NULL      | Daemon    |       | NULL             |
|  361549194 | system user     |                      | NULL      | Slave_IO  |       | NULL             |
|  361549195 | system user     |                      | NULL      | Slave_SQL | [..]  | NULL             |
| 1388567878 | orchestrator    | 208.80.155.103:53492 | NULL      | Sleep     |       | NULL             |
| 1388567890 | orchestrator    | 208.80.155.103:53540 | NULL      | Sleep     |       | NULL             |
| 1388569286 | orchestrator    | 208.80.155.103:53912 | NULL      | Sleep     |       | NULL             |
| 1521766009 | wikiadmin       | 10.64.16.77:41168    | mainstash | Query     |       | SHOW PROCESSLIST |
+------------+-----------------+----------------------+-----------+-----------+--------------------------+

I sampled this a few more times and saw no MW-related connections show up, other than the wikiadmin one from myself.

Small followup- show processlist is not a reliable way to make sure a user is no longer connecting- mostly because a very fast connection, like most of those from webrequests could take very little time. Sys is usually more reliable:

root@db1153.eqiad.wmnet[sys]> SELECT user, statements, total_connections FROM sys.x$user_summary WHERE user like 'wiki%';     
+----------------+---------------+-------------------+
| user           | statements    | total_connections |
+----------------+---------------+-------------------+
| wikiadmin      |       3327036 |            451678 |
| wikiuser2022   |   62857270140 |         171924309 |
| wikiuser202206 | 1401945273212 |        4655199110 |
+----------------+---------------+-------------------+
3 rows in set (0.029 sec)

root@db1153.eqiad.wmnet[sys]> SELECT sleep(100);
+------------+
| sleep(100) |
+------------+
|          0 |
+------------+
1 row in set (100.001 sec)

root@db1153.eqiad.wmnet[sys]> SELECT user, statements, total_connections FROM sys.x$user_summary WHERE user like 'wiki%';
+----------------+---------------+-------------------+
| user           | statements    | total_connections |
+----------------+---------------+-------------------+
| wikiadmin      |       3327036 |            451678 |
| wikiuser2022   |   62857270140 |         171924309 |
| wikiuser202206 | 1401945273212 |        4655199110 |
+----------------+---------------+-------------------+
3 rows in set (0.028 sec)

Compare that to:

root@db1153.eqiad.wmnet[sys]> SELECT user, statements, total_connections FROM sys.x$user_summary WHERE user like 'root%';
+------+------------+-------------------+
| user | statements | total_connections |
+------+------------+-------------------+
| root |  284154964 |           1130340 |
+------+------------+-------------------+
1 row in set (0.028 sec)

root@db1153.eqiad.wmnet[sys]> SELECT user, statements, total_connections FROM sys.x$user_summary WHERE user like 'root%';
+------+------------+-------------------+
| user | statements | total_connections |
+------+------------+-------------------+
| root |  284156242 |           1130341 |
+------+------------+-------------------+
1 row in set (0.028 sec)

Change 827858 abandoned by Aaron Schulz:

[mediawiki/core@master] rdbms: disable lagged-replica-mode if "max lag" is INF for all replicas

Reason:

wmf-config changed instead to not have replicas

https://gerrit.wikimedia.org/r/827858

Change 828072 abandoned by Aaron Schulz:

[operations/mediawiki-config@master] Set "max lag" for all x2 servers to INF

Reason:

https://gerrit.wikimedia.org/r/828072