Over the last couple of weeks we've seen some degradation in IO performance on VMs. In particular, IO is now too slow for reliable etcd.
This is the tracking task for investigating the change in performance and optimizing things.
Over the last couple of weeks we've seen some degradation in IO performance on VMs. In particular, IO is now too slow for reliable etcd.
This is the tracking task for investigating the change in performance and optimizing things.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | taavi | T211393 openstack-browser and horizon: Security group and floating IP quota information being pulled from Nova instead of Neutron for eqiad1-r | |||
Resolved | Andrew | T211777 Can't get quota information from Neutron API | |||
Resolved | Andrew | T261137 upgrade cloud-vps openstack to Openstack version 'Victoria' | |||
Resolved | dcaro | T261136 upgrade cloud-vps openstack to Openstack version 'Ussuri' | |||
Resolved | Andrew | T261138 Upgrade Horizon to latest OpenStack release | |||
Resolved | Andrew | T261135 upgrade cloud-vps openstack to Openstack version 'Train' | |||
Resolved | Andrew | T261134 upgrade cloud-vps openstack to Openstack version 'Stein' | |||
Resolved | Andrew | T259399 Upgrade cloudvirts to Debian Buster | |||
Resolved | dcaro | T216195 Move cloudvirt hosts to 10Gb ethernet | |||
Resolved | Andrew | T194334 [Epic] Modern Cloud VPS storage layer | |||
Resolved | Andrew | T261132 Move all cloud-vps VMs to Ceph | |||
Resolved | Andrew | T270305 Ceph performance tuning |
Mentioned in SAL (#wikimedia-cloud) [2020-12-17T22:14:45Z] <andrewbogott> setting pg number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) T270305
Mentioned in SAL (#wikimedia-cloud) [2020-12-17T22:16:07Z] <andrewbogott> setting pgp number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) T270305 (same as pg)
Mentioned in SAL (#wikimedia-cloud) [2020-12-18T20:46:34Z] <andrewbogott> setting pg and pgp number to 4096 for eqiad1-compute as joachim thinks 8192 might be too much T270305
Things seem moderately better with 4096 pgs. Latency numbers seemed even better with 8196. After we have weeks of data at 4096 let's switch back to 8192 for another few weeks and get some good data.
the current behavior is pretty good (maybe better than it was with 8192) so we aren't going to mess with success.
A couple ideas that were left over from this, for the next round of improvements: