Page MenuHomePhabricator

Denied access error when querying wikishared dbs
Closed, ResolvedPublic

Description

When trying to query any of the wikishared dbs on the x1 cluster, the following error message appears:

Access denied for user 'research'@'10.%' to database 'wikishared'

I've confirmed this error message appears when trying to access from the following methods:

  • Jupyter notebook using the wmfdata package
  • Sequel Pro
  • analytics-mysql

I'm not receiving this error when I try to access any of the other MariaDB databases.

Event Timeline

Note: This appears to be impacting all of Product-Analytics at least and is currently blocking the following analyses that require access to the cx_translations and cx_corpora tables: T303812 and T299769.

Please let me know if there is any additional information that would be helpful.

(cc @razzi )

BTullis triaged this task as Unbreak Now! priority.Apr 27 2022, 2:25 PM
BTullis moved this task from Incoming to Ops on the Data-Engineering board.
BTullis added a subscriber: BTullis.

Looking into this for you now with the highest priority.

It looks to me like the databases have gone from this host.

btullis@stat1008:~$ mysql -h x1-analytics-replica.eqiad.wmnet -P 3320
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 1068039
Server version: 10.4.22-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:research@x1-analytics-replica.eqiad.wmnet [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.008 sec)

mysql:research@x1-analytics-replica.eqiad.wmnet [(none)]>

Perhaps this shard was inadvertently wiped when the host was re-imaged. I'll look into it now.

It looks like this is a missing grant issue. I haven't worked out quite how/why it was deleted, but I'm working on adding it back now.

Marostegui added a subscriber: Marostegui.

Hello, I have triaged this and it is now available. I am unsure what is going on and I will need more time (and probably) downtime to do a proper research. When could I do that? For now the user is working again with a new role (research_role2), but I would like to investigate why the original one isn't working for x1 all of a sudden (it does for s6 and s8)

research@dbstore1005.eqiad.wmnet[(none)]> show databases;
+---------------------------+
| Database                  |
+---------------------------+
| aawiki                    |
| aawikibooks               |
| aawiktionary              |
| abwiki                    |
| abwiktionary              |
| acewiki                   |
<snip>
Marostegui lowered the priority of this task from Unbreak Now! to Medium.Apr 27 2022, 4:06 PM
Marostegui moved this task from Triage to In progress on the DBA board.

Decreasing from UBN to normal as the user works again. I will wait for a date where the reports are done and I can play with x1 without affecting any reporting.

Interestingly, I have been able to get research user back on its original research_role by dropping and recreating that role, so we should be good again. I will wait for confirmation that everything works as expected before going ahead and cleaning up the temporary role

research@dbstore1005.eqiad.wmnet[(none)]> SELECT CURRENT_ROLE;
+---------------+
| CURRENT_ROLE  |
+---------------+
| research_role |
+---------------+
1 row in set (0.001 sec)
research@dbstore1005.eqiad.wmnet[(none)]> show databases;
+---------------------------+
| Database                  |
+---------------------------+
| aawiki                    |
| aawikibooks               |
<snip>
mysql:root@localhost [(none)]> show grants for 'research'@'10.%';
+------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for research@10.%                                                                                                                 |
+------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT `research_role` TO `research`@`10.%`                                                                                               |
| GRANT USAGE ON *.* TO `research`@`10.%` IDENTIFIED BY PASSWORD '*xxx' WITH MAX_USER_CONNECTIONS 200 |
| SET DEFAULT ROLE `research_role` FOR `research`@`10.%`                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)

@MNeisler please confirm if you can access everything on x1 again. Thanks

Hello, I have triaged this and it is now available. I am unsure what is going on and I will need more time (and probably) downtime to do a proper research. When could I do that?

I think it's probably best to liaise with Product Analytics directly to coordinate downtime requirements. I'll happily help if you need it.

If everything works fine again there's no need to get downtime, I can drop the temporary role without any.
Let's wait for @MNeisler

EChetty raised the priority of this task from Medium to Unbreak Now!.Apr 27 2022, 4:49 PM

@Marostegui & @BTullis , @Iflorez (and I assume others) are still having access issues. Are there multiple places where we manage access policies that need to be updated?

No, they might need to reset the connection though

I am successfully using the research user as it can be seen here:

research@dbstore1005.eqiad.wmnet[wikishared]> SELECT CURRENT_USER();
+----------------+
| CURRENT_USER() |
+----------------+
| research@10.%  |
+----------------+
1 row in set (0.001 sec)

research@dbstore1005.eqiad.wmnet[wikishared]> SELECT CURRENT_ROLE;
+---------------+
| CURRENT_ROLE  |
+---------------+
| research_role |
+---------------+
1 row in set (0.001 sec)

research@dbstore1005.eqiad.wmnet[wikishared]> show tables;
+------------------------------------+
| Tables_in_wikishared               |
+------------------------------------+
| bounce_records                     |
| cx_corpora                         |
| cx_lists                           |
| cx_notification_log                |
| cx_section_translations            |
| cx_significant_edits               |
| cx_suggestions                     |
| cx_translations                    |
| cx_translators                     |
| echo_push_provider                 |
| echo_push_subscription             |
| echo_push_topic                    |
| echo_unread_wikis                  |
| reading_list                       |
| reading_list_entry                 |
| reading_list_project               |
| urlshortcodes                      |
| wikimedia_editor_tasks_counts      |
| wikimedia_editor_tasks_edit_streak |
| wikimedia_editor_tasks_keys        |
+------------------------------------+
20 rows in set (0.001 sec)

Hello,
I'm unable to access x1, s7, and ToolForge.

The error message I'm getting:

Unable to connect to host x1-analytics-replica.eqiad.wmnet because access was denied.
Double-check your username and password and ensure that access from your current location is permitted.
MySQL said: Access denied for user 'research'@'10.64.21.118' (using password: NO)

My x1/s7 ssh password hasn't changed.
My ToolForge password hasn't changed. I checked it last Friday.

x1 and s7 are not providing error details.
This is the ToolForge error detail:

Used command: /usr/bin/ssh -v -N -S none -o ControlMaster=no -o ExitOnForwardFailure=yes -o ConnectTimeout=10 -o NumberOfPasswordPrompts=3 -o TCPKeepAlive=no -o ServerAliveInterval=60 -o ServerAliveCountMax=1 -p 22 iflorez@login.toolforge.org -L 62517:tools.db.svc.wikimedia.cloud:3306

OpenSSH_8.6p1, LibreSSL 2.8.3
debug1: Reading configuration data /Users/iflorez/.ssh/config
debug1: /Users/iflorez/.ssh/config line 4: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Control socket " none" does not exist
debug1: Connecting to login.toolforge.org [185.15.56.48] port 22.
debug1: fd 5 clearing O_NONBLOCK
debug1: Connection established.
debug1: identity file /Users/iflorez/.ssh/id_rsa type -1
debug1: identity file /Users/iflorez/.ssh/id_rsa-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_dsa type -1
debug1: identity file /Users/iflorez/.ssh/id_dsa-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_ecdsa type -1
debug1: identity file /Users/iflorez/.ssh/id_ecdsa-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_ecdsa_sk type -1
debug1: identity file /Users/iflorez/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_ed25519 type 3
debug1: identity file /Users/iflorez/.ssh/id_ed25519-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_ed25519_sk type -1
debug1: identity file /Users/iflorez/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /Users/iflorez/.ssh/id_xmss type -1
debug1: identity file /Users/iflorez/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u7
debug1: compat_banner: match: OpenSSH_7.4p1 Debian-10+deb9u7 pat OpenSSH_7.0*,OpenSSH_7.1*,OpenSSH_7.2*,OpenSSH_7.3*,OpenSSH_7.4*,OpenSSH_7.5*,OpenSSH_7.6*,OpenSSH_7.7* compat 0x04000002
debug1: Authenticating to login.toolforge.org:22 as 'iflorez'
debug1: load_hostkeys: fopen /Users/iflorez/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:0SkfUoy4EkOiHFbM8mRuC1n5bVfphuSAjdY5u/OyiVY
debug1: load_hostkeys: fopen /Users/iflorez/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'login.toolforge.org' is known and matches the ED25519 host key.
debug1: Found key in /Users/iflorez/.ssh/known_hosts:18
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: Will attempt key: /Users/iflorez/.ssh/id_ed25519 ED25519 SHA256:6riWb+xMurUgcOL5ATedroowxGY3/kT3SWaWvQzd32c agent
debug1: Will attempt key: /Users/iflorez/.ssh/id_rsa
debug1: Will attempt key: /Users/iflorez/.ssh/id_dsa
debug1: Will attempt key: /Users/iflorez/.ssh/id_ecdsa
debug1: Will attempt key: /Users/iflorez/.ssh/id_ecdsa_sk
debug1: Will attempt key: /Users/iflorez/.ssh/id_ed25519_sk
debug1: Will attempt key: /Users/iflorez/.ssh/id_xmss
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,hostbased
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/iflorez/.ssh/id_ed25519 ED25519 SHA256:6riWb+xMurUgcOL5ATedroowxGY3/kT3SWaWvQzd32c agent
debug1: Authentications that can continue: publickey,hostbased
debug1: Trying private key: /Users/iflorez/.ssh/id_rsa
debug1: Trying private key: /Users/iflorez/.ssh/id_dsa
debug1: Trying private key: /Users/iflorez/.ssh/id_ecdsa
debug1: Trying private key: /Users/iflorez/.ssh/id_ecdsa_sk
debug1: Trying private key: /Users/iflorez/.ssh/id_ed25519_sk
debug1: Trying private key: /Users/iflorez/.ssh/id_xmss
debug1: No more authentication methods to try.
iflorez@login.toolforge.org: Permission denied (publickey,hostbased).

I did a FLUSH PRIVILEGES; right now. If it's still not working, can you give the exact command you run?

I don't think tools.db.svc.wikimedia.cloud is dbstore1005.

@Iflorez that's an SSH error not a MySQL error.

It's clouddb1001.clouddb-services.eqiad1.wikimedia.cloud

@Iflorez that's an SSH error not a MySQL error.

Yeah. I think @Iflorez has two different issues. If the dbstore1005 is fixed, we can close this but the toolforge connection issue remains.

Confirming I can now access dbs on x1 via analytics-mysql and on stat8 Jupyter notebook with the wmfdata package. Thanks all!

@Iflorez I am seeing this in the logs:

Apr 27 17:35:09 tools-sgebastion-07 sshd[23227]: Failed publickey for iflorez from x.x.x.x port 62518 ssh2: ED25519 SHA256:6riWb+xMurUgcOL5ATedroowxGY3/kT3SWaWvQzd32c

Please double check your .ssh/config and that you have configured your toolforge ssh key in https://toolsadmin.wikimedia.org/profile/settings/ssh-keys/.

It's clouddb1001.clouddb-services.eqiad1.wikimedia.cloud

tools.db.svc.wikimedia.cloud is the current canonical ToolsDB service name, please use that instead of names for individual hosts like clouddb1001.

In T306984#7885509, @Majavah wrote:

It's clouddb1001.clouddb-services.eqiad1.wikimedia.cloud

tools.db.svc.wikimedia.cloud is the current canonical ToolsDB service name, please use that instead of names for individual hosts like clouddb1001.

I know, I was stating that this is not dbstore1005 so not the grant issue on dbstore1005 (this ticket originally)

@Iflorez you might want to open a different task for your issue, as the MySQL one seems solved (pending me to clean up the other temp role)

I'm now able to access x1 on sequel pro, the password for sequel pro to access the stat machines had to be reset.

I'm now able to access ToolForge via sequel pro, after swapping out the ssh key that I had noted for ToolForge. Thank you!

EChetty lowered the priority of this task from Unbreak Now! to Medium.Apr 27 2022, 9:21 PM

Thank you all!

Marostegui claimed this task.

I have cleaned up research_role2 and double checked that everything keeps working as expected and I can query wikishared.