I've been unable to connect to the db replicas, specifically on server 10.64.37.27, with Access denied errors. Per discussion with @bd808 on IRC, this may be to do with the new pooled server missing auth records, which need rebuilding.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
wmcs: Add db1141.eqiad.wmnet to maintain-dbusers | operations/puppet | production | +4 -0 |
Related Objects
- Mentioned Here
- T249188: Reimage labsdb1011 to Buster and MariaDB 10.4
Event Timeline
Seems likely related to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 and the pooling of db1141.
[08:19] <marostegui> arturo: I have pooled a new host on the wikireplicas services, specifically on the analytics role, if you notice people complaining or something, please ping me [08:29] < arturo> ack marostegui
Attempts are failing as @Naypta's user on Toolforge (u24474 in the dbs), and only to the analytics cluster.
$ sql enwiki ERROR 1045 (28000): Access denied for user 'u24474'@'10.64.37.27' (using password: YES) $ sql --cluster web enwiki Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 65686419 Server version: 10.1.43-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [enwiki_p]>
Change 599466 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers
@Bstorm was able to verify the difference in number of provisioned accounts on db1141.eqiad.wmnet vs labsdb1009 and labsdb1011 (which also show a small delta).
Change 599466 merged by Bstorm:
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers
Mentioned in SAL (#wikimedia-cloud) [2020-05-28T23:02:01Z] <bd808> /usr/local/sbin/maintain-dbusers --debug harvest-replicas (T253930)
So the fix was:
- add the new host into the config for maintain-dbusers
- restart the maintain-dbusers maintain systemd unit on the NFS primary master
- run /usr/local/sbin/maintain-dbusers harvest-replicas to update the state tracking database
- let maintain-dbusers maintain notice the absent records and do it's magic to fill them in
mysql:root@localhost [(none)]> select count(*) from mysql.user; +----------+ | count(*) | +----------+ | 4072 | +----------+ 1 row in set (0.001 sec)