Migrate etcd ganeti VMs to plain disk template
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	akosiaris
	May 29 2019, 11:01 AM

Description

etcd is sensitive to IO latencies and DRBD mode C (which ganeti uses to ensure disk consistency between primary and secondary) causes by definition this to increase. Not only that, but at times a full DRBD re-sync might be required from primary to secondary, inducing a period during which writes to etcd are going to be throttled. The Kubernetes cluster could suffer because of that. Since etcd is a replicated datastore with HA built-in there is really no reason to add the extra layer of complexity and latency DRBD adds. Switching all etcd VMs to the plain disk template would solve this issue. The drawback would be a bit more complicated ganeti cluster maintenance operations (knowing that etcd is plain and ignoring it during migrations)

Related Objects

Mentioned In: T287238: ML Serve controller vms show a slowly increasing resource usage leak over time

Event Timeline

akosiaris created this task.May 29 2019, 11:01 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2019, 11:01 AM

https://wikitech.wikimedia.org/wiki/Ganeti#VMs_without_DRBD_disk_template has been added to address the drawback needing to be communicated and documented.

All etcd VMs have been migrated to use the plain disk template. Note there is one more VM that is like that, namely d-i-test for which reserving the extra disk space makes no sense.

Resolving this.

elukey mentioned this in T287238: ML Serve controller vms show a slowly increasing resource usage leak over time.Jul 23 2021, 10:34 AM

Migrate etcd ganeti VMs to plain disk templateClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Migrate etcd ganeti VMs to plain disk template
Closed, ResolvedPublic
Actions