Page MenuHomePhabricator

Trove DB full
Closed, ResolvedPublic

Description

oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud (from glamtools tool) seems to be full.

Trying to run the tool that adds data yields error:

ERROR HY000 (1114): The table 'files' is full

Running truncate files yields error:

Got error 168 from storage engine

Please increase the available size of the Trove DB significantly. I moved the tool there from toolsdb because I was asked, and because I was told that there would be enough storage so I won't have to continue creating a million sqlite databases in /data/project . Please help me out here.

Related Objects

StatusSubtypeAssignedTask
Resolved rook
ResolvedAndrew

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
JJMC89 changed the subtype of this task from "Bug Report" to "Task".

If I'm understanding this correctly, you have a trove db in a cloud vps project. I believe you should be able to resize the db in horizon by going to databases > instances > select the dropdown menu on the database > resize volume

Does that work for you?

It does not, unfortunately:

 Error: Unable to resize volume. Details
Quota exceeded for resources: ['volumes']. (HTTP 413)

Sounds good. What's the project name (I don't see a glamtools project)? And what size would you like the db volume? I believe the default is 20 gigabyte

@rook Thanks, the project is called baglama2. At the moment the volume size shows 60GB, which I suggest to increast to 100GB.

The two large tables in the DB are 7.4GB and 16.3GB, respectively. The large table can probably be optimized, but to operate it needs some space. There will be more data coming in every month, so eventually it will need to be resized again, until toolforge offers some auto-scaling storage. This data is really important for GLAM people, so I think we need to keep it around.

+1

Resizing may or may not be graceful but has worked for me recently

C'est fait

root@cloudcontrol1007:~# openstack database quota show baglama2
+-----------+--------+----------+-------+
| Resource  | In Use | Reserved | Limit |
+-----------+--------+----------+-------+
| backups   |      0 |        0 |     2 |
| instances |      1 |        0 |    10 |
| ram       |   7168 |        0 |    -1 |
| volumes   |     60 |        0 |    60 |
+-----------+--------+----------+-------+
root@cloudcontrol1007:~# openstack database quota update baglama2 volumes 100
+---------+-------+
| Field   | Value |
+---------+-------+
| volumes | 100   |
+---------+-------+
root@cloudcontrol1007:~# openstack database quota show baglama2
+-----------+--------+----------+-------+
| Resource  | In Use | Reserved | Limit |
+-----------+--------+----------+-------+
| backups   |      0 |        0 |     2 |
| instances |      1 |        0 |    10 |
| ram       |   7168 |        0 |    -1 |
| volumes   |     60 |        0 |   100 |
+-----------+--------+----------+-------+
rook claimed this task.

Sorry, but I don't think it's done yet.

Horizon still tells me it's 60GB.

ALso, the DB server has stopped responding:

Used command:  /usr/bin/ssh -v -N -S none -o ControlMaster=no -o ExitOnForwardFailure=yes -o ConnectTimeout=10 -o NumberOfPasswordPrompts=3 -o UserKnownHostsFile="/Users/mm6/Library/Containers/com.sequel-ace.sequel-ace/Data/.keys/ssh_known_hosts_strict" -F /Applications/Sequel Ace.app/Contents/Resources/ssh_config -i /Users/mm6/.ssh/id_rsa -o TCPKeepAlive=no -o ServerAliveInterval=60 -o ServerAliveCountMax=1 magnus@tools-login.wmflabs.org -L 59522:oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud:3306

OpenSSH_9.7p1, LibreSSL 3.3.6
debug1: Reading configuration data /Applications/Sequel Ace.app/Contents/Resources/ssh_config
debug1: /Applications/Sequel Ace.app/Contents/Resources/ssh_config line 1: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to tools-login.wmflabs.org [185.15.56.57] port 22.
debug1: fd 5 clearing O_NONBLOCK
debug1: Connection established.
debug1: identity file /Users/mm6/.ssh/id_rsa type -1
debug1: identity file /Users/mm6/.ssh/id_rsa-cert type -1
debug1: identity file /Users/mm6/.keys/id_rsa type -1
debug1: identity file /Users/mm6/.keys/id_rsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.7
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u3
debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u3 pat OpenSSH* compat 0x04000000
debug1: Authenticating to tools-login.wmflabs.org:22 as 'magnus'
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:ig5pjXbJTaobzoV5gf2ooHQ5zsqGfNyqNhvPoGiuClM
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'tools-login.wmflabs.org' is known and matches the ED25519 host key.
debug1: Found key in /Users/mm6/Library/Containers/com.sequel-ace.sequel-ace/Data/.keys/ssh_known_hosts_strict:1
debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: ssh_packet_read_poll2: resetting read seqnr 3
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_ext_info_client_parse: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_ext_info_check_ver: publickey-hostbound@openssh.com=<0>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,hostbased
debug1: Next authentication method: publickey
debug1: get_agent_identities: ssh_get_authentication_socket: Operation not permitted
debug1: Will attempt key: /Users/mm6/.ssh/id_rsa  explicit
debug1: Will attempt key: /Users/mm6/.keys/id_rsa  explicit
debug1: Trying private key: /Users/mm6/.ssh/id_rsa
debug1: read_passphrase: can't open /dev/tty: Device not configured
Authenticated to tools-login.wmflabs.org ([185.15.56.57]:22) using "publickey".
debug1: Local connections to LOCALHOST:59522 forwarded to remote address oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud:3306
debug1: Local forwarding listening on ::1 port 59522.
debug1: channel 0: new port-listener [port listener] (inactive timeout: 0)
debug1: Local forwarding listening on 127.0.0.1 port 59522.
debug1: channel 1: new port-listener [port listener] (inactive timeout: 0)
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: pledge: network
debug1: Connection to port 59522 forwarding to oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud port 3306 requested.
debug1: channel 2: new direct-tcpip [direct-tcpip] (inactive timeout: 0)
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Remote: /usr/sbin/ssh-key-ldap-lookup:2: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
channel 2: open failed: connect failed: Connection refused
debug1: channel 2: free: direct-tcpip: listening port 59522 for oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud port 3306, connect from 127.0.0.1 port 59526 to 127.0.0.1 port 59522, nchannels 3
debug1: Connection to port 59522 forwarding to oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud port 3306 requested.
debug1: channel 2: new direct-tcpip [direct-tcpip] (inactive timeout: 0)
channel 2: open failed: connect failed: Connection refused
debug1: channel 2: free: direct-tcpip: listening port 59522 for oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud port 3306, connect from 127.0.0.1 port 59527 to 127.0.0.1 port 59522, nchannels 3
Unable to connect to host oxztsldqokc.svc.trove.eqiad1.wikimedia.cloud because the port connection via SSH was refused.

Please ensure that your MySQL host is set up to allow TCP/IP connections (no --skip-networking) and is configured to allow connections from the host you are tunnelling via.

You may also want to check the port is correct and that you have the necessary privileges.

Checking the error detail will show the SSH debug log which may provide more details.

MySQL said: Lost connection to MySQL server at 'reading initial communication packet', system error: 0
``

I see the volume as listed as 60GB in horizon. To verify did you try to resize the db in horizon by going to databases > instances > select the dropdown menu on the database > resize volume?

@Magnus sorry for the hangup, looks like Trove automation isn't great at this yet. I did a bit of hands-on repair and I think you're good to go now (at least, all the status flags say that thins are up and working.) Do things look good from your end?

@rook for your reference, Trove resized the volume but not the filesystem so I had to ssh to the instance, ummount and resize2fs, and then reboot everything. Definitely a trove bug but in the meantime I need to document this someplace :(