Currently there's some small balancing issues, but if we don't actively move the PGs around this will become a problem soon, enabling upmap will help dealing with that issue.
Description
Related Objects
- Mentioned Here
- P14642 Masterwork From Distant Lands
Event Timeline
It seems that we could achieve quite a nice balancing, some preliminar test (just offline checking how the balancer
would act):
dcaro@cloudcephmon1001:~$ sudo ceph osd getmap -o osd_map got osdmap epoch 366517 dcaro@cloudcephmon1001:~$ sudo osdmaptool osd_map --upmap out.txt --upmap-pool eqiad1-compute --upmap-active | phaste osdmaptool: osdmap file 'osd_map' https://phabricator.wikimedia.org/P14642
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T10:43:59Z] <dcaro> enabled ceph upmap balancer on codfw (T274573,T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T14:43:07Z] <dcaro> enabling ceph upmap pg balancer on equiad (T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T14:49:37Z] <dcaro> Running the first_pass balancing plan on ceph eqiad, current eval 0.030622 (T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T15:02:16Z] <dcaro> First pass finished, improved eval to 0.030075 (T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T15:03:48Z] <dcaro> Executing a second pass, there's still movements to improve the eval of 0.030075 (T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T15:08:43Z] <dcaro> Activating continuous upmap balancer, keeping a close eye (T274573)
Mentioned in SAL (#wikimedia-cloud) [2021-04-13T16:42:54Z] <dcaro> Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ (T274573)