Page MenuHomePhabricator

"Error: disk I/O error" from sqlite3 access attempt made from tools-sgebastion-10
Closed, ResolvedPublicBUG REPORT

Description

Error behavior as originally reported by @matmarex in the wikimedia-cloud irc channel:

$ ssh login.toolforge.org
$ become dtcheck
$ sqlite3 /data/project/dtcheck/public_html/database.sqlite "select * from meta"
Error: disk I/O error

Expected behavior from the dev.toolforge.org bastion points to problems with NFS on login.toolforge.org (tools-sgebastion-10.tools.eqiad1.wikimedia.cloud):

$ ssh dev.toolforge.org
$ become dtcheck
$ sqlite3 /data/project/dtcheck/public_html/database.sqlite "select * from meta"
1|2023-05-11 13:09:11.922282|541.283580531

Related Objects

Event Timeline

[14:47]  < MatmaRex> strace gave me these details: fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = -1 EIO (Input/output error) but unfortunately this tells me nothing, i searched for various parts of this error but did not learn much

On tools-sgebastion-11.tools.eqiad1.wikimedia.cloud (working):

$ mount | grep /srv/tools/project
tools-nfs.svc.tools.eqiad1.wikimedia.cloud:/srv/tools/project on /mnt/nfs/labstore-secondary-tools-project type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.6.70,local_lock=none,addr=172.16.7.14)

On tools-sgebastion-10.tools.eqiad1.wikimedia.cloud (broken):

$ mount | grep /srv/tools/project
tools-nfs.svc.tools.eqiad1.wikimedia.cloud:/srv/tools/project on /mnt/nfs/labstore-secondary-tools-project type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.6.95,local_lock=none,addr=172.16.7.14)

No meaningful difference in the NFS mount configuration as shown by that command.

root@tools-sgebastion-10:/data/project/dtcheck/public_html# touch flock-test.txt
root@tools-sgebastion-10:/data/project/dtcheck/public_html# flock flock-test.txt sleep 2
flock: flock-test.txt: Input/output error
root@tools-sgebastion-11:/data/project/dtcheck/public_html# flock flock-test.txt sleep 2

I think tools-sgebastion-11 needs a reboot.

[15:26]  <    dcaro> I tried flushing caches and such (sync + echo 3 > /proc/sys/vm/drop_caches), but that did not help xd
[15:26]  <    dcaro> +1 for reboot
[15:26]  <    dcaro> lots of [Thu May 11 14:36:28 2023] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence 000000004b845782! in dmesg

Mentioned in SAL (#wikimedia-cloud) [2023-05-11T15:48:29Z] <bd808> Rebooted tools-sgebastion-10 for T336510

bd808 claimed this task.
$ ssh login.toolforge.org
$ sudo become dtcheck
$ sqlite3 /data/project/dtcheck/public_html/database.sqlite "select * from meta"
1|2023-05-11 13:09:11.922282|541.283580531

Fixed by reboot.