Replace or remove Debian Buster VMs in 'video' cloud-vps project
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Andrew
	Mar 21 2024, 8:20 PM

Description

Buster's LTS support ends on June 30, 2024.

To see a report of existing Buster VMs, visit https://os-deprecation.toolforge.org/

Related Objects

Mentioned In: T371047: Request: add +16 cpu / +16 Gb ram to video project quota
T370964: Request: add 360Gb storage to video project quota
T365154: video2commons general failure
rTVTC5959d84aea56: T360711 - Bookworm update
rTVTCe2d63bff0c7a: T360711 - Bookworm update
T367599: Request to join video project
Mentioned Here: T365154: video2commons general failure

Event Timeline

Andrew created this task.Mar 21 2024, 8:20 PM

Restricted Application added a project: cloud-services-team. · View Herald TranscriptMar 21 2024, 8:20 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

JJMC89 added a project: video2commons.Jun 24 2024, 10:41 PM

Don-vip changed the task status from Open to In Progress.Jun 30 2024, 8:01 PM

Don-vip claimed this task.

My plan to solve this, with current status:

Done: Suspend gfg instance to check it is not used for anything (instance count: 8/10)
Done: Create a new video-redis-bookworm with Redis 7.0 (version under a free licence) (instance count: 9/10)
Done: Create a new video-dev-bookworm instance + setup video2commons-test frontend to have a development frontend & backend encoding instance with updated Python/ffmpeg libraries that will use the new Redis instance (instance count: 10/10)
Done: Shut down all celery workers, wait for completion, shutdown video2commons frontend on Toolforge, migrate Redis database from video-redis-buster to video-redis-bookworm, suspend video-redis-buster instance
Done: Drop gfg instance if it is confirmed at this stage that it is not needed. Recreate encoding07 instance as encoding01, the first new encoding instance using the new Redis + web proxy v2c1.wmcloud.org (instance count: 9/10)
Done: Update and restart video2commons frontend on Toolforge
Done: Drop encoding06, create encoding02 + web proxy v2c2.wmcloud.org (instance count: 9/10)
Done: Drop encoding05, create encoding03 + web proxy v2c3.wmcloud.org (instance count: 9/10)
Done: Drop encoding04 (instance count: 8/10)
Done: Drop video-dev-buster and video-redis-buster (instance count: 6/10)
Done: Create three new encoding instances, encoding04, encoding05, encoding06 (instance count: 9/10)

I don't touch to video-nfs-1. Can please someone confirm that it is managed by VMCS team?

Don-vip moved this task from Inbox to Soon! on the cloud-services-team board.Jul 2 2024, 7:37 PM

Don-vip mentioned this in T367599: Request to join video project.Jul 2 2024, 8:08 PM

Mentioned in SAL (#wikimedia-cloud) [2024-07-02T20:18:39Z] <don-vip> Creating video-redis-bookworm instance as per T360711

Don-vip mentioned this in rTVTCe2d63bff0c7a: T360711 - Bookworm update.Jul 6 2024, 10:53 PM

Don-vip mentioned this in rTVTC5959d84aea56: T360711 - Bookworm update.Jul 6 2024, 11:07 PM

Hello @Don-vip! first of all: yes, I will maintain the nfs server. Is there anything I can do to keep you progressing on this project?

*bump*

Hi Andrew!
Sorry for the delay. Thanks a lot for the maintenance of the nfs server :)
For the rest of the activities, it's ok, I hope to complete them this week :)
I won't hesitate to ask for help if I face difficulties.

Don-vip mentioned this in T365154: video2commons general failure.Jul 24 2024, 7:07 PM

Huntster subscribed.Jul 24 2024, 7:34 PM

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:40:35Z] <don-vip> drop gfg and encoding07 instances (unused) as per T360711 + T365154

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:41:24Z] <don-vip> migrated redis database from video-redis-buster to video-redis-bookworm as per T360711 + T365154

@Andrew in fact I need help.
I see the old encoding instances use the role::labs::lvm::srv puppet role to get more disk space.
I understand this is outdated and now we're using Cinder volumes directly from Horizon. Should I go this way? Does it mean the NFS server will no longer be necessary? I'm not sure to understand exactly how the storage part works.

Update: OK I understand that NFS and role::labs::lvm::srv are not related:

encoding04:~$ df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
...
/dev/mapper/vd-second--local--disk                               60G  4.0G   53G   8% /srv
video-nfs.svc.video.eqiad1.wikimedia.cloud:/srv/video/project    32G   17G   14G  55% /mnt/nfs/labstore-secondary-project
video-nfs.svc.video.eqiad1.wikimedia.cloud:/srv/video/home       32G   17G   14G  55% /mnt/nfs/labstore-secondary-home
...

I'll ask a volume quota increase to get 6 volumes of 60 Gb in addition to the current 32 Gb NFS volume.

Don-vip mentioned this in T370964: Request: add 360Gb storage to video project quota.Jul 24 2024, 9:42 PM

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T23:39:45Z] <don-vip> updated and restarted video2commons frontend to use the bookworm redis instance. Setup encoding01 as the first new bookworm instance as per T360711 + T365154

Don-vip mentioned this in T371047: Request: add +16 cpu / +16 Gb ram to video project quota.Jul 25 2024, 6:16 PM

@Andrew can I get support from WMCS team, I got something that doesn't work and I don't understand what?

I have created three new instances without problem: encoding01, encoding02 and encoding03. Everything works fine. These instance names did not exist before.

I have deleted instances encoding04 and encoding05 and recreated them with the same name. These ones I cannot ssh to. I cannot ping them from the working instances. Yet everything seems normal from Horizon and Grafana. I'm lost :(

Mentioned in SAL (#wikimedia-cloud) [2024-07-25T19:05:36Z] <don-vip> setup new instances encoding02 and encoding03 as per T360711 + T365154

I have deleted instances encoding04 and encoding05 and recreated them with the same name. These ones I cannot ssh to.

Recreating with the same name can unfortunately create some DNS confusion:

Jul 26 11:57:33 cloudcontrol1006 wmcs-dnsleaks[2879650]: Found 2 ptr recordsets for the same VM: encoding04.video.eqiad1.wikimedia.cloud. ['12.0.16.172.in-addr.arpa.', '218.2.16.172.in-addr.arpa.']
Jul 26 11:57:33 cloudcontrol1006 wmcs-dnsleaks[2879650]: Found 2 ptr recordsets for the same VM: encoding05.video.eqiad1.wikimedia.cloud. ['69.0.16.172.in-addr.arpa.', '51.4.16.172.in-addr.arpa.']

I'll see if I can fix it manually.

@Don-vip I removed the 2 extra records (the reverse DNS records), and manually updated the A records because they were still pointing to the old IPs. Now you should be able to ssh to both instances.

@fnegri thanks a lot, it works! Is there a procedure for me to follow to avoid this problem? I am going to recreate encoding06 instance.

The best procedure is to not re-use hostnames (typically when there are names like host05 we would just increment and name the next one host06). If you need to re-use names, wait a few minutes after deleting before recreating.

Update completed!

Replace or remove Debian Buster VMs in 'video' cloud-vps projectClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Replace or remove Debian Buster VMs in 'video' cloud-vps project
Closed, ResolvedPublic
Actions