Page MenuHomePhabricator

ToolsDB: simplify volume chain
Closed, ResolvedPublic

Description

When the Cinder volumes for tools-db-1 and tools-db-2 were created in T329970 and T329521, they were created using Cinder snapshots (which in turn use Ceph RBD snapshots). This means we now have a volume chain of copy-on-write layers that cannot be easily flattened:

tools-db-basevolume1 -> tools-db-basesnapshot1 -> tools-db-basevolume2 -> tools-db-basesnapshot2 -> ( tools-db-1 | tools-db-2 )

The volumes tools-db-1 and tools-db-2 are the ones currently in use, but they are both layers on top of tools-db-basesnapshot2 and its parent layers. All the parent layers cannot be deleted because they are part of the same copy-on-write volume chain.

This should not cause any issue, but it would be cleaner to remove this long layer hierarchy. The easiest way I can think of is as follows:

  1. Create a new snapshot of tools-db-1 and name it tools-db-1-tmpsnapshot
  2. Create a new volume based on that snapshot and name it tools-db-1-fromsnapshot
  3. Create two new empty volumes tools-db-3 and tools-db-4
  4. Rsync all data from tools-db-1-fromsnapshot to both tools-db-3 and tools-db-4
  5. Create two new instances tools-db-3 and tools-db-4 and attach the new volumes tools-db-3 and tools-db-4
  6. Set up tools-db-3 to replicate from tools-db-2, and set up tools-db-4 to replicate from tools-db-3
  7. When they are in sync, promote tools-db-3 to become the new primary
  8. Delete the old instances tools-db-1 and tools-db-2, and their entire volume chains
  9. Delete the snapshot tools-db-1-tmpsnapshot and the volume tools-db-1-fromsnapshot

Event Timeline

fnegri updated the task description. (Show Details)
fnegri updated the task description. (Show Details)

I've added T344717 and T344719 as subtasks, after those two tasks are completed the volume chain should be simplified and we can avoid following the procedure detailed in the description of this task.

fnegri closed this task as Resolved.EditedDec 6 2024, 5:31 PM
fnegri claimed this task.

After T352206: [toolsdb] Upgrade to MariaDB 10.6 I could delete all the following volumes:

  • tools-db-basevolume1 could not delete, see next comment
  • tools-db-basesnapshot1 could not delete, see next comment
  • tools-db-basevolume2
  • tools-db-basesnapshot2

The new upgrade procedure ensures that no temporary volumes are left when the upgrade is completed.

fnegri reopened this task as In Progress.Dec 6 2024, 6:42 PM

Deleting the volumes had some complications, because of T358774: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up and also the fact backy2 was trying to take a backup of one volume as I was trying to delete its snapshots.

I managed to delete tools-db-basevolume2 and tools-db-basesnapshot2, but not tools-db-basevolume1 and its snapshot tools-db-basesnapshot1.

tools-db-basesnapshot1 is currently stuck in Deleting, because backy2 is taking a new backup of the underlying volume tools-db-basevolume1

3753818 ?        Rl   109:11 backy2 [Backing up (2/2: Data) rbd://eqiad1-cinder/volume-5510b9a4-0498-4bcb-b044-0b0d80e99ada@2024-12-06T16:58:21_cloudbackup2004 Read Queue [=         ] Write Queue [          ] (22.7% 145.2MB/sØ ETA 5h48m) ]
fnegri moved this task from In progress to Done on the cloud-services-team (FY2024/2025-Q1-Q2) board.

tools-db-basesnapshot1 is currently stuck in Deleting

This deletion completed successfully over the weekend, and after that I could also successfully delete tools-db-basevolume1, so this is now Resolved.