Page MenuHomePhabricator

[ceph] Investigate if there's a way to degrade instead of failing when jumbo frames are being dropped in the network
Open, HighPublic

Description

Investigate if there's any way to allow ceph to degrade service (not kill other OSDs) if jumbo frames begin to be dropped around the network.

Things to verify:

  • Is the don't fragment being set for heartbeat traffic? Can that be configured?
    • How to verify: doing a tcpdump of hearbteat traffic and checking the flags on the packets
  • What size have the hearbeat packets? Can that be configured? (yes it can)
    • How to verify: doing a tcpdump of hearbteat traffic and checking the max size for them
  • Is the don't fragment being set for regular traffic? Can that be configured?
    • How to verify: doing a tcpdump of osd<->osd traffic and checking the flags on the packets
  • Can ceph regular traffic adapt to the discovered MTU of the network? (as opposed to always using the max MTU of the interface) If so, can that be configured?
    • How to verify: TBD

Other stuff

Current OSD hearbeat config options:

root@cloudcephosd1001:~# ceph config show-with-defaults osd.33
...
osd_heartbeat_grace                                         20                                                                                                                                                                                                                                         default
osd_heartbeat_interval                                      6                                                                                                                                                                                                                                          default
osd_heartbeat_min_healthy_ratio                             0.330000                                                                                                                                                                                                                                   default
osd_heartbeat_min_peers                                     10                                                                                                                                                                                                                                         default
osd_heartbeat_min_size                                      2000              <- this seems the most interesting                                                                                                                                                                                                                            default
osd_heartbeat_stale                                         600                                                                                                                                                                                                                                        default
osd_heartbeat_use_min_delay_socket                          false                                                                                                                                                                                                                                      default
...

Related code/bugs/docs:

mon osd reporter subtree level is used to group the peers into the “subcluster” by their common ancestor type in CRUSH map. By default, only two reports from different subtree are required to report another Ceph OSD Daemon down. You can change the number of reporters from unique subtrees and the common ancestor type required to report a Ceph OSD Daemon down to a Ceph Monitor by adding an mon osd min down reporters and mon osd reporter subtree level settings under the [mon] section of your Ceph configuration file, or by setting the value at runtime.

Event Timeline

dcaro triaged this task as High priority.Feb 15 2023, 5:44 PM
dcaro created this task.
dcaro updated the task description. (Show Details)