Page MenuHomePhabricator

Patch demo Catalyst backend failed: "failed: to create fsnotify watcher: too many open files"
Closed, ResolvedPublic

Description

I just tried to create a demo using the beta Catalyst backend, and I got this:

2025-01-08T22:13:59.068762002Z: Cloning into '/var/www/html/w'...
failed: to create fsnotify watcher: too many open files
Log stream has terminated, but deployment is not complete. Status is: starting

image.png (2×3 px, 289 KB)

The demo wiki is listed at https://patchdemo.wmcloud.org, but stuck on "Creating…". Creating a wiki with the same config on the legacy backend worked.

image.png (221×2 px, 91 KB)

Event Timeline

thcipriani claimed this task.
thcipriani added subscribers: EBomani, thcipriani.

@EBomani had the suspicion that this has to do with kernel inotify limits.

Initial limits on the k3s host when I logged in where:

root@k3s:~# sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 124690
root@k3s:~# sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128
root@k3s:~# sysctl fs.inotify.max_queued_events
fs.inotify.max_queued_events = 16384

In the WatchExec docs it indicates that:

max_user_instances may also be too low on some systems. For example, on servers it is frequently set as low as 127. A good workstation default is 1024.

root@k3s:~# sysctl -w fs.inotify.max_user_instances=1024
fs.inotify.max_user_instances = 1024
root@k3s:~# sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 1024

And then persisting it via directly editing /etc/sysctl.conf (since /etc/sysctl.d/README says puppet will delete unmanaged files therein :)).

We've tested a few instances since then and it seems like it's working again.

One strange thing: I can't find this error message in any logs. Places I checked: dmesg on the host, journactl for k3s on the host, the patchdemo apache container while the error was happening, the catalyst-api container. Dunno. Something to dig more for.

Optimistically, resolving this one.