Page MenuHomePhabricator

Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project
Closed, ResolvedPublic

Description

LTS support for Debian Buster ends in a few weeks, and a few weeks after that I'll want to start deleting and removing those VMs.

Please either remove Buster VMs from your project or respond here with a proposed plan and timeline.

Thanks!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I will try to fix it during this weekend (to also minimize disruptions)

created new instance codesearch9 (g4.cores4.ram8.disk20) to replace codesearch8 (g3.cores4.ram8.disk20).

note it's also g3 -> g4 and we attempt to skip bullseye and see how far we can get.

Mentioned in SAL (#wikimedia-cloud) [2024-06-14T16:59:06Z] <mutante> createing instance codesearch9 with bookworm to replace codesearch8 (T367479)

I can help.

Thank you <3 I will pick up from whereever far you get. Thanks!

This is a good opportunity because it will be a good chunk of T268199 (moving codesearch to prod).

Like with other buster machines we also have the buster/puppet7 conflict here:

Notice: puppet7 is not available on buster.  forcing this is likely going to cause issue.

And we are doing the announced "g3" to "g4" image flavor switch.

Things to fix:

E: Package 'docker-ce' has no installation candidate
..
Execution of '/usr/sbin/usermod -G docker codesearch' returned 6: usermod: group 'docker' does not exist

Needs to change to docker.io.

Change #1043901 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] codesearch: add support for docker-ce on bookworm

https://gerrit.wikimedia.org/r/1043901

@Ladsgroup So, docker-ce won't be found on bookworm. There is docker.io there. That part isn't surprising. But because I remembered we had the same thing with CI servers when we upgraded them not long ago.. and ended up using docker-ce from another repo, I copied the setup we use there, assuming it's also good for here. But it's just an assumption.

See change above.

There is maybe more in profile::ci::docker that we want to copy, like auto-pruning old images, config etc. But as a first step I wanted to focus on just the package install.

Change #1046724 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] codesearch: install docker.io if on bookworm

https://gerrit.wikimedia.org/r/1046724

@Ladsgroup new change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046724 would just install docker.io if on bookworm and be noop on existing buster machine. if that seems fine we can continue here.

Yeah, I don't think we had any specific need for a very specific docker engine, it's just whatever works, works. It's not large-scale or specific setup in any way.

Change #1046724 merged by Dzahn:

[operations/puppet@production] codesearch: install docker.io if on bookworm

https://gerrit.wikimedia.org/r/1046724

Change #1046776 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] codesearch: fix dependencies on changing docker package name

https://gerrit.wikimedia.org/r/1046776

Change #1046776 merged by Dzahn:

[operations/puppet@production] codesearch: fix dependencies on changing docker package name

https://gerrit.wikimedia.org/r/1046776

Yeah, I don't think we had any specific need for a very specific docker engine, it's just whatever works, works. It's not large-scale or specific setup in any way.

After the follow-up change above now puppet runs on codesearch9 and there are no further puppet errors :)

docker.io is now installed.

docker containers with hound are running:

dzahn@codesearch9:~$ sudo /usr/bin/docker container ls
CONTAINER ID   IMAGE                                                           COMMAND                  CREATED          STATUS         PORTS                                       NAMES
443f8c80aeaa   docker-registry.wikimedia.org/wikimedia/labs-codesearch:hound   "./houndd -conf /dat…"   11 seconds ago   Up 1 second    0.0.0.0:6096->6080/tcp, :::6096->6080/tcp   hound-shouthow
49c67830aef2   docker-registry.wikimedia.org/wikimedia/labs-codesearch:hound   "./houndd -conf /dat…"   13 seconds ago   Up 2 seconds   0.0.0.0:6090->6080/tcp, :::6090->6080/tcp   hound-deployed

@Ladsgroup The next step to debug here is:

https://codesearch-backend.wmcloud.org/search/

vs

"invalid backend" on https://codesearch-new.wmcloud.org/_health/

(I created codesearch-new proxy to point to 3002 like the old -backend URL)

ladsgroup@codesearch9:~$ sudo service hound-search status
● hound-search.service - hound-search
     Loaded: loaded (/lib/systemd/system/hound-search.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-06-18 06:00:30 UTC; 5h 34min ago
   Main PID: 489792 (docker)
      Tasks: 9 (limit: 9521)
     Memory: 24.9M
        CPU: 4.424s
     CGroup: /system.slice/hound-search.service
             └─489792 /usr/bin/docker run -p 6080:6080 --name hound-search --user=root -v /srv/hound/hound-search:/data -v /etc/hound-gitconfig:/root/.gitconfig docker-registry.wikimedia.org/wikimedia/labs-code>

Jun 18 10:49:06 codesearch9 docker[489792]: Continuing...
Jun 18 10:49:06 codesearch9 docker[489792]: 2024/06/18 10:49:06 Failed to git reset /data/data/vcs-502107fa7ff323dc0b56e08bb7e2a9bc0c45325e, see output below
Jun 18 10:49:06 codesearch9 docker[489792]: fatal: Unable to create '/data/data/vcs-502107fa7ff323dc0b56e08bb7e2a9bc0c45325e/.git/index.lock': No space left on device
Jun 18 10:49:06 codesearch9 docker[489792]: Continuing...
Jun 18 10:49:06 codesearch9 docker[489792]: 2024/06/18 10:49:06 Failed to git fetch /data/data/vcs-1320e010c2fefa8d50ea1e4fc956dbadb70df709, see output below
Jun 18 10:49:06 codesearch9 docker[489792]: fatal: Unable to create '/data/data/vcs-1320e010c2fefa8d50ea1e4fc956dbadb70df709/.git/shallow.lock': No space left on device
Jun 18 10:49:06 codesearch9 docker[489792]: Continuing...
Jun 18 10:49:06 codesearch9 docker[489792]: 2024/06/18 10:49:06 Failed to git reset /data/data/vcs-1320e010c2fefa8d50ea1e4fc956dbadb70df709, see output below
Jun 18 10:49:06 codesearch9 docker[489792]: fatal: Unable to create '/data/data/vcs-1320e010c2fefa8d50ea1e4fc956dbadb70df709/.git/index.lock': No space left on device
Jun 18 10:49:06 codesearch9 docker[489792]: Continuing...
ladsgroup@codesearch9:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           796M  1.6M  795M   1% /run
/dev/sda1        20G   19G  646M  97% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      124M   12M  113M  10% /boot/efi
tmpfs           796M     0  796M   0% /run/user/0
tmpfs           796M     0  796M   0% /run/user/3182

I'd assume this needs to be mounted on the storage service open stack provides. I forgot its name.

Yup:

ladsgroup@codesearch8:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           799M   82M  717M  11% /run
/dev/sda1        20G   10G  8.9G  54% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sdb         75G   49G   23G  68% /srv
tmpfs           799M     0  799M   0% /run/user/0
tmpfs           799M     0  799M   0% /run/user/3182

attached the new volume, it's being rebuilt.

I switched the proxy to point to codesearch9 now and shut off codesearch8. I will delete it tomorrow.

FWIW, I didn't create a systemd service for the frontend (I think the old one didn't have it either). I just ran this instead:

ladsgroup@codesearch9:/srv/codesearch/frontend$ docker run -it -d --restart=unless-stopped -p 3003:80 codesearch-frontend
ad5331c28ef5043802b5b9188198fb07328bd7858a3ffc725f3c8bee309f169c

Maybe that should be a service. Future-me problem though.

Change #1043901 abandoned by Dzahn:

[operations/puppet@production] codesearch: add support for docker-ce on bookworm

Reason:

in favor of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046724

https://gerrit.wikimedia.org/r/1043901

I deleted the webproxy codesearch-old.wmcloud.org. I think we can also delete codesearch-beta.wmcloud.org ?

Everything points to the same place now and I found it confusing there were so many proxies.

FWIW, I didn't create a systemd service for the frontend (I think the old one didn't have it either). I just ran this instead:

Thanks for documenting that. Let's add that. That's exactly the kind of thing I had in mind would be needed for a path towards T268199.

Since the buster machine is already shut down this is resolved.

Thanks Ladsgroup!

Removed the old VM and the corresponding volumes

Ladsgroup claimed this task.