Page MenuHomePhabricator

MariaDB User s54171, access denied on replicas.
Closed, ResolvedPublic

Description

On toolsforge project "rotbot" i can no longer connect to the replicated databases:

tools.rotbot@tools-sgebastion-07:~$ sql commonswiki_p
ERROR 1045 (28000): Access denied for user 's54171'@'10.64.37.14' (using password: YES)

The tool worked for a while and stopped, so i looked into the log and noticed the aforementioned error. I haven't changed the .cnf file.

Looking into the logs, this problem is appearing since approx. 16:00, 10 October 2019 (UTC).

Event Timeline

Reedy renamed this task from MariaDB User s54171, assess denided on replicas. to MariaDB User s54171, access denied on replicas..Oct 13 2019, 2:57 PM
Reedy edited projects, added Data-Services; removed Cloud-Services.

It probably needs someone with wmcs-admin/ops production rights.

bd808 claimed this task.
bd808 edited projects, added cloud-services-team (Kanban); removed cloud-services-team.
bd808 subscribed.

I poked around using the instructions at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Debugging_bad_mysql_credentials and found that the tool's credentials were only created on labsdb1011.eqiad.wmnet:

$ CHECK_UID=s54171
$ mysql -h m5-master.eqiad.wmnet -u labsdbaccounts -p -e "USE labsdbaccounts; SELECT * FROM account WHERE mysql_username='${CHECK_UID}'\G"
Enter password:
*************************** 1. row ***************************
            id: 19358
mysql_username: s54171
          type: tool
      username: tools.rotbot
 password_hash: <<redacted>>
$ ACCT_ID=19358
$ mysql -h m5-master.eqiad.wmnet -u labsdbaccounts -p -e "USE labsdbaccounts; SELECT * FROM labsdbaccounts.account_host WHERE account_id=${ACCT_ID}\G"
Enter password:
*************************** 1. row ***************************
        id: 134942
account_id: 19358
  hostname: labsdb1011.eqiad.wmnet
    status: present

I tried to clean up the partially created account using sudo /usr/local/sbin/maintain-dbusers delete tools.rotbot but that failed:

$ sudo /usr/local/sbin/maintain-dbusers delete tools.rotbot --debug
Traceback (most recent call last):
  File "/usr/local/sbin/maintain-dbusers", line 624, in <module>
    main()
  File "/usr/local/sbin/maintain-dbusers", line 620, in main
    delete_account(config, args.extra_args, args.account_type)
  File "/usr/local/sbin/maintain-dbusers", line 485, in delete_account
    labsdb_cur.execute("DROP USER %s" % row['mysql_username'])
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 167, in execute
    result = self._query(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 323, in _query
    conn.query(q)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 836, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1020, in _read_query_result
    result.read()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1303, in read
    first_packet = self.connection._read_packet()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 982, in _read_packet
    packet.check_error()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 394, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python3/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
    _check_mysql_exception(errinfo)
  File "/usr/lib/python3/dist-packages/pymysql/err.py", line 115, in _check_mysql_exception
    raise InternalError(errno, errorvalue)
pymysql.err.InternalError: (1396, "Operation DROP USER failed for 's54171'@'%'")

I then manually checked labsdb1011 for the expected account (which was failing to delete above):

$ mysql -h labsdb1011.eqiad.wmnet -u labsdbadmin -p -e 'SELECT User, Password from mysql.user where User like "${CHECK_UID}";'
Enter password:

Hmmm... account is not actually there. I think this might be a side effect of T235016: Reclone labsdb1011. Let's just go ahead and clean up the records in the labsdbaccounts tables:

$ mysql -h m5-master.eqiad.wmnet -u labsdbaccounts -p
Enter password:
(labsdbaccounts@m5-master.eqiad.wmnet) [(none)]> use labsdbaccounts;
Database changed
(labsdbaccounts@m5-master.eqiad.wmnet) [labsdbaccounts]> delete from account_host where account_id = 19358;
Query OK, 1 row affected (0.00 sec)

(labsdbaccounts@m5-master.eqiad.wmnet) [labsdbaccounts]> delete from account where id = 19358;
Query OK, 1 row affected (0.00 sec)

And manually clean up the replica.my.cnf file:

$ sudo chattr -i /srv/tools/shared/tools/project/rotbot/replica.my.cnf
$ sudo rm /srv/tools/shared/tools/project/rotbot/replica.my.cnf

Once that was done, maintain-dbusers regenerated everything as expected:

$ sudo journalctl -u maintain-dbusers --no-pager
[...]
Oct 14 02:02:16 labstore1004 /usr/local/sbin/maintain-dbusers[13404]: Wrote replica.my.cnf for tool tools.rotbot
Oct 14 02:03:27 labstore1004 /usr/local/sbin/maintain-dbusers[13404]: Created account in labsdb1009.eqiad.wmnet for tool tools.rotbot
Oct 14 02:03:27 labstore1004 /usr/local/sbin/maintain-dbusers[13404]: Created account in labsdb1010.eqiad.wmnet for tool tools.rotbot
Oct 14 02:03:27 labstore1004 /usr/local/sbin/maintain-dbusers[13404]: Created account in 172.16.7.153 for tool tools.rotbot
Oct 14 02:03:27 labstore1004 /usr/local/sbin/maintain-dbusers[13404]: Created account in labsdb1011.eqiad.wmnet for tool tools.rotbot