Page MenuHomePhabricator

ceph-mon 16.2.15+ds-0+deb12u1 uses all the RAM
Open, LowPublic

Description

I just tried upgrading cloudcephmon2004-dev to bookworm + ceph 16.2. It's configured so that it should pool with the other mons running 15.x.

It seems happy but it also consumes all the RAM on the system (64G) and the system locks up a few minutes after boot.

2025-07-02T01:47:57.112+0000 7f1ca049a640  0 starting mon.cloudcephmon2004-dev rank 1 at public addrs [v2:10.192.20.19:3300/0,v1:10.192.20.19:6789/0] at bind addrs [v2:10.192.20.19:3300/0,v1:10.192.20.19:6789/0] mon_data /var/lib/ceph/mon/ceph-cloudcephmon2004-dev fsid 489c4187-17bc-44dc-9aeb-1d044c9bba9e
2025-07-02T01:47:57.116+0000 7f1ca049a640  1 mon.cloudcephmon2004-dev@-1(???) e0 preinit fsid 489c4187-17bc-44dc-9aeb-1d044c9bba9e
2025-07-02T01:47:57.116+0000 7f1ca049a640  1 mon.cloudcephmon2004-dev@-1(???) e0  initial_members cloudcephmon2004-dev,cloudcephmon2005-dev,cloudcephmon2006-dev, filtering seed monmap
2025-07-02T01:47:57.120+0000 7f1ca049a640  0 mon.cloudcephmon2004-dev@-1(probing) e0  my rank is now 0 (was -1)

Event Timeline

Upgrading an existing mon node to 16.2.15-1~bpo11+1 on Bullseye does not produce this problem.

I upgraded everything in-place to 16.2.15 on Bullseye, then upgraded one node to Bookworm and it does not exhibit this issue. So whatever was happening here, we can avoid it by decoupling the ceph version upgrade from the OS rebuild.

Andrew triaged this task as Low priority.Jul 2 2025, 8:51 PM