Page MenuHomePhabricator

Replace or remove Debian Buster VMs in 'video' cloud-vps project
Closed, ResolvedPublic

Description

Buster's LTS support ends on June 30, 2024.

To see a report of existing Buster VMs, visit https://os-deprecation.toolforge.org/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Don-vip changed the task status from Open to In Progress.Jun 30 2024, 8:01 PM
Don-vip claimed this task.

My plan to solve this, with current status:

  1. Done: Suspend gfg instance to check it is not used for anything (instance count: 8/10)
  2. Done: Create a new video-redis-bookworm with Redis 7.0 (version under a free licence) (instance count: 9/10)
  3. Done: Create a new video-dev-bookworm instance + setup video2commons-test frontend to have a development frontend & backend encoding instance with updated Python/ffmpeg libraries that will use the new Redis instance (instance count: 10/10)
  4. Done: Shut down all celery workers, wait for completion, shutdown video2commons frontend on Toolforge, migrate Redis database from video-redis-buster to video-redis-bookworm, suspend video-redis-buster instance
  5. Done: Drop gfg instance if it is confirmed at this stage that it is not needed. Recreate encoding07 instance as encoding01, the first new encoding instance using the new Redis + web proxy v2c1.wmcloud.org (instance count: 9/10)
  6. Done: Update and restart video2commons frontend on Toolforge
  7. Done: Drop encoding06, create encoding02 + web proxy v2c2.wmcloud.org (instance count: 9/10)
  8. Done: Drop encoding05, create encoding03 + web proxy v2c3.wmcloud.org (instance count: 9/10)
  9. Done: Drop encoding04 (instance count: 8/10)
  10. Done: Drop video-dev-buster and video-redis-buster (instance count: 6/10)
  11. Done: Create three new encoding instances, encoding04, encoding05, encoding06 (instance count: 9/10)

I don't touch to video-nfs-1. Can please someone confirm that it is managed by VMCS team?

Mentioned in SAL (#wikimedia-cloud) [2024-07-02T20:18:39Z] <don-vip> Creating video-redis-bookworm instance as per T360711

Hello @Don-vip! first of all: yes, I will maintain the nfs server. Is there anything I can do to keep you progressing on this project?

Hi Andrew!
Sorry for the delay. Thanks a lot for the maintenance of the nfs server :)
For the rest of the activities, it's ok, I hope to complete them this week :)
I won't hesitate to ask for help if I face difficulties.

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:40:35Z] <don-vip> drop gfg and encoding07 instances (unused) as per T360711 + T365154

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:41:24Z] <don-vip> migrated redis database from video-redis-buster to video-redis-bookworm as per T360711 + T365154

@Andrew in fact I need help.
I see the old encoding instances use the role::labs::lvm::srv puppet role to get more disk space.
I understand this is outdated and now we're using Cinder volumes directly from Horizon. Should I go this way? Does it mean the NFS server will no longer be necessary? I'm not sure to understand exactly how the storage part works.

Update: OK I understand that NFS and role::labs::lvm::srv are not related:

encoding04:~$ df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
...
/dev/mapper/vd-second--local--disk                               60G  4.0G   53G   8% /srv
video-nfs.svc.video.eqiad1.wikimedia.cloud:/srv/video/project    32G   17G   14G  55% /mnt/nfs/labstore-secondary-project
video-nfs.svc.video.eqiad1.wikimedia.cloud:/srv/video/home       32G   17G   14G  55% /mnt/nfs/labstore-secondary-home
...

I'll ask a volume quota increase to get 6 volumes of 60 Gb in addition to the current 32 Gb NFS volume.

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T23:39:45Z] <don-vip> updated and restarted video2commons frontend to use the bookworm redis instance. Setup encoding01 as the first new bookworm instance as per T360711 + T365154

@Andrew can I get support from WMCS team, I got something that doesn't work and I don't understand what?

I have created three new instances without problem: encoding01, encoding02 and encoding03. Everything works fine. These instance names did not exist before.

I have deleted instances encoding04 and encoding05 and recreated them with the same name. These ones I cannot ssh to. I cannot ping them from the working instances. Yet everything seems normal from Horizon and Grafana. I'm lost :(

Mentioned in SAL (#wikimedia-cloud) [2024-07-25T19:05:36Z] <don-vip> setup new instances encoding02 and encoding03 as per T360711 + T365154

I have deleted instances encoding04 and encoding05 and recreated them with the same name. These ones I cannot ssh to.

Recreating with the same name can unfortunately create some DNS confusion:

Jul 26 11:57:33 cloudcontrol1006 wmcs-dnsleaks[2879650]: Found 2 ptr recordsets for the same VM: encoding04.video.eqiad1.wikimedia.cloud. ['12.0.16.172.in-addr.arpa.', '218.2.16.172.in-addr.arpa.']
Jul 26 11:57:33 cloudcontrol1006 wmcs-dnsleaks[2879650]: Found 2 ptr recordsets for the same VM: encoding05.video.eqiad1.wikimedia.cloud. ['69.0.16.172.in-addr.arpa.', '51.4.16.172.in-addr.arpa.']

I'll see if I can fix it manually.

@Don-vip I removed the 2 extra records (the reverse DNS records), and manually updated the A records because they were still pointing to the old IPs. Now you should be able to ssh to both instances.

@fnegri thanks a lot, it works! Is there a procedure for me to follow to avoid this problem? I am going to recreate encoding06 instance.

The best procedure is to not re-use hostnames (typically when there are names like host05 we would just increment and name the next one host06). If you need to re-use names, wait a few minutes after deleting before recreating.

Update completed!