Page MenuHomePhabricator

Rebuild auth records for db1141.eqiad.wmnet
Closed, ResolvedPublic

Description

I've been unable to connect to the db replicas, specifically on server 10.64.37.27, with Access denied errors. Per discussion with @bd808 on IRC, this may be to do with the new pooled server missing auth records, which need rebuilding.

Event Timeline

Seems likely related to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 and the pooling of db1141.

[08:19]  <marostegui> arturo: I have pooled a new host on the wikireplicas services, specifically on the analytics role, if you notice people complaining or something, please ping me
[08:29]  <   arturo> ack marostegui

Attempts are failing as @Naypta's user on Toolforge (u24474 in the dbs), and only to the analytics cluster.

$ sql enwiki
ERROR 1045 (28000): Access denied for user 'u24474'@'10.64.37.27' (using password: YES)
$ sql --cluster web enwiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 65686419
Server version: 10.1.43-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [enwiki_p]>

Change 599466 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

@Bstorm was able to verify the difference in number of provisioned accounts on db1141.eqiad.wmnet vs labsdb1009 and labsdb1011 (which also show a small delta).

bd808 renamed this task from Rebuild auth records for new db replica servers to Rebuild auth records for db1141.eqiad.wmnet.May 28 2020, 10:41 PM
bd808 added a project: Data-Services.
bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.
bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.

Change 599466 merged by Bstorm:
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T23:02:01Z] <bd808> /usr/local/sbin/maintain-dbusers --debug harvest-replicas (T253930)

bd808 claimed this task.

So the fix was:

  • add the new host into the config for maintain-dbusers
  • restart the maintain-dbusers maintain systemd unit on the NFS primary master
  • run /usr/local/sbin/maintain-dbusers harvest-replicas to update the state tracking database
  • let maintain-dbusers maintain notice the absent records and do it's magic to fill them in
mysql:root@localhost [(none)]> select count(*) from mysql.user;
+----------+
| count(*) |
+----------+
|     4072 |
+----------+
1 row in set (0.001 sec)