Page MenuHomePhabricator

Rebuild auth records for db1141.eqiad.wmnet
Closed, ResolvedPublic

Description

I've been unable to connect to the db replicas, specifically on server 10.64.37.27, with Access denied errors. Per discussion with @bd808 on IRC, this may be to do with the new pooled server missing auth records, which need rebuilding.

Event Timeline

bd808 triaged this task as High priority.May 28 2020, 10:17 PM

Seems likely related to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 and the pooling of db1141.

[08:19]  <marostegui> arturo: I have pooled a new host on the wikireplicas services, specifically on the analytics role, if you notice people complaining or something, please ping me
[08:29]  <   arturo> ack marostegui

Attempts are failing as @Naypta's user on Toolforge (u24474 in the dbs), and only to the analytics cluster.

$ sql enwiki
ERROR 1045 (28000): Access denied for user 'u24474'@'10.64.37.27' (using password: YES)
$ sql --cluster web enwiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 65686419
Server version: 10.1.43-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [enwiki_p]>

Change 599466 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

@Bstorm was able to verify the difference in number of provisioned accounts on db1141.eqiad.wmnet vs labsdb1009 and labsdb1011 (which also show a small delta).

bd808 renamed this task from Rebuild auth records for new db replica servers to Rebuild auth records for db1141.eqiad.wmnet.May 28 2020, 10:41 PM
bd808 added a project: Data-Services.
bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.
bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.

Change 599466 merged by Bstorm:
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T23:02:01Z] <bd808> /usr/local/sbin/maintain-dbusers --debug harvest-replicas (T253930)

bd808 claimed this task.

So the fix was:

  • add the new host into the config for maintain-dbusers
  • restart the maintain-dbusers maintain systemd unit on the NFS primary master
  • run /usr/local/sbin/maintain-dbusers harvest-replicas to update the state tracking database
  • let maintain-dbusers maintain notice the absent records and do it's magic to fill them in
mysql:root@localhost [(none)]> select count(*) from mysql.user;
+----------+
| count(*) |
+----------+
|     4072 |
+----------+
1 row in set (0.001 sec)