Rebuild auth records for db1141.eqiad.wmnet
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Naypta
	May 28 2020, 10:07 PM

Description

I've been unable to connect to the db replicas, specifically on server 10.64.37.27, with Access denied errors. Per discussion with @bd808 on IRC, this may be to do with the new pooled server missing auth records, which need rebuilding.

Details

	Subject	Repo	Branch	Lines +/-
	wmcs: Add db1141.eqiad.wmnet to maintain-dbusers	operations/puppet	production	+4 -0

Customize query in gerrit

Related Objects

Mentioned Here: T249188: Reimage labsdb1011 to Buster and MariaDB 10.4

Event Timeline

Naypta created this task.May 28 2020, 10:07 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 28 2020, 10:07 PM

Krenair subscribed.May 28 2020, 10:09 PM

Seems likely related to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 and the pooling of db1141.

[08:19]  <marostegui> arturo: I have pooled a new host on the wikireplicas services, specifically on the analytics role, if you notice people complaining or something, please ping me
[08:29]  <   arturo> ack marostegui

Attempts are failing as @Naypta's user on Toolforge (u24474 in the dbs), and only to the analytics cluster.

$ sql enwiki
ERROR 1045 (28000): Access denied for user 'u24474'@'10.64.37.27' (using password: YES)
$ sql --cluster web enwiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 65686419
Server version: 10.1.43-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [enwiki_p]>

Change 599466 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

gerritbot added a project: Patch-For-Review.May 28 2020, 10:38 PM

@Bstorm was able to verify the difference in number of provisioned accounts on db1141.eqiad.wmnet vs labsdb1009 and labsdb1011 (which also show a small delta).

bd808 renamed this task from Rebuild auth records for new db replica servers to Rebuild auth records for db1141.eqiad.wmnet.May 28 2020, 10:41 PM

bd808 added a project: Data-Services.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.

bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.

Change 599466 merged by Bstorm:
[operations/puppet@production] wmcs: Add db1141.eqiad.wmnet to maintain-dbusers

https://gerrit.wikimedia.org/r/599466

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T23:02:01Z] <bd808> /usr/local/sbin/maintain-dbusers --debug harvest-replicas (T253930)

So the fix was:

add the new host into the config for maintain-dbusers
restart the maintain-dbusers maintain systemd unit on the NFS primary master
run /usr/local/sbin/maintain-dbusers harvest-replicas to update the state tracking database
let maintain-dbusers maintain notice the absent records and do it's magic to fill them in

mysql:root@localhost [(none)]> select count(*) from mysql.user;
+----------+
| count(*) |
+----------+
|     4072 |
+----------+
1 row in set (0.001 sec)

Maintenance_bot removed a project: Patch-For-Review.May 28 2020, 11:10 PM

Rebuild auth records for db1141.eqiad.wmnetClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Rebuild auth records for db1141.eqiad.wmnet
Closed, ResolvedPublic
Actions