Page MenuHomePhabricator

Cloud VPS "cvn" project Stretch deprecation
Closed, ResolvedPublic

Description

The end of life of Debian Stretch is approaching in 2022 and we need to move to Debian Bullseye (or Buster) before that date.

All instances in the cvn project need to upgrade as soon as possible. Instances not upgraded by 2022-05-01 may be subject to deletion unless prior arrangements for an extended deadline has been approved by the Cloud VPS administration team.

Remaining Debian Stretch instances (live report):

Listed administrators are:

See also:

More info on current project instances is available via openstack browser.

Event Timeline

StrikerBot triaged this task as Medium priority.Apr 13 2022, 4:59 PM
StrikerBot created this task.
Zabe changed the edit policy from "Custom Policy" to "All Users".Apr 13 2022, 5:35 PM

We'll need a bit more time I think. I've just finished migrating various Toolforge tools. I'm hoping for this one to actually collaborate so as to document and teach others how these servers are set up and wired together.

The plan for CVN is to use this oppertunity to retry using Toolforge rather than managing our own VMs. To do this in the least-overhead and least-deprecated sense (from our perspective) would be to run the bots as a continuous via the Kubernetes cluster. For this we would need a docker image that provides Mono.

Luck has it, Taavi just created these for a few other projects: T311466: Create a kubernetes container with mono and dotnet

Our infra is described at:

Can I get a progress update on this? I'm hoping to delete some more Stretch VMs tomorrow.

@Andrew We haven't yet begun. I'm looking into T311466 now. The first thing I'm running into is how to build and configure the bot prior to running it. For running it, we can use a toolforge-k8s job that uses the new mono68 image and runs the appropiate command. But during testing and building we'd need a shell pod where we can run some bash commands interactively such that mono/msbuild are available from there as well.

I see that the shell shortcut command is tied to webservice, but what is the recommendation for getting a shell for non-web containers?

https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Get_a_shell_inside_a_running_Pod

This second page mentions a way to get a shell, but it requires a pod already existing. I guess one way to get this to work is to first configure a toolforge job with the right mono68 docker image and memory allocation but run some kind of dummy command that will run for a reasonably long time, and then somehow get its pod ID and use that to get a shell on there?

I see that the shell shortcut command is tied to webservice, but what is the recommendation for getting a shell for non-web containers?

We need to work on this which probably means adding something to the toolforge-jobs command, but https://wikitech.wikimedia.org/wiki/User:BryanDavis/Kubernetes#Launch_an_interactive_shell_in_the_cluster might be helpful to you in the sort term. You can use toolforge-jobs images to find the full name of each image.

I'm so far avoiding getting deep into the details of this task, but -- this project is one of very few projects remaining with Stretch VMs and needs to be taken care of soon.

Summarising what @AntiCompositeNumber found when trying to run one of the ~30 CVNBot instances in the "cvn" Toolforge account:

  • msbuild is missing from the Docker mono image. And it is seemingly not packaged by Debian. Maybe this could be installed from the mono-project.com repo instead?
  • <AntiComposite> they don't package targeting bullseye, but the buster package appears to work

As a workaround, we build it locally as proof of concept and then scp'ed the exe to Toolforge. Running it for about two hours, already led to 5-10 minutes of lag (we usually lag a few seconds at most), and dropping/disconnecting events from the irc.wikimedia.org input presumably due to inability to ping-pong in due time? It runs on a seperate thread and works fine in the Cloud VPS but something is causing it to slow down by a lot.

I haven't had time to investigate, but my instinct is that the slowdown is due to I/O from it trying to open/close the 30MB Sqlite file on every RC event, which in Toolforge is based on NFS. We opted out from NFS in WMCS years ago (instead making hourly backups to NFS to avoid losing the database). I don't know if Toolforge has ways to mount data without NFS in a way that e.g. doesn't need to be real-time sync with anything, it just needs to be persisted between restarts.

There's also a 2014 ticket for adding MySQL support to CVNBot (https://github.com/countervandalism/CVNBot/issues/17) which with the right C-sharp/dot-net expertise could be realized as that'd presumably perform much better.

@Andrew We've decided to upgrade in place for now and revisit Toolforge after that.

Checking the quotas, it looks like we're at exactly half of most resources which is presumably left over from the last Debian upgrade (4/8 instances, 11/22 CPU, 22/44 GB RAM). The one resource we're lacking for a smooth transition is floating IPs.

Could we temporarily have 4 instead of 2 floating IPs?

Mentioned in SAL (#countervandalism) [2023-01-17T00:15:22Z] <Krinkle> Suspend cvn-apache9, replaced by cvn-apache10, ref T306066

Thanks for nadling cvn-apache!

There are two more Stretch hosts left in this project: cvn-app8.cvn.eqiad1.wikimedia.cloud and cvn-app9.cvn.eqiad1.wikimedia.cloud. Is there any hope of getting those replaced soon? Can I do anything to support the effort?

Change 882224 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[labs/countervandalism/stillalive@master] Move CVNBots 1, 2, 3, 4, 5, 11, 12 from app8 to app10

https://gerrit.wikimedia.org/r/882224

Change 882225 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[labs/countervandalism/stillalive@master] Move CVNBots 13, 14, 15, 18, 20, 21 from app8 to app10

https://gerrit.wikimedia.org/r/882225

Change 882226 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[labs/countervandalism/stillalive@master] Move CVNBots 6, 7, 8, 9, 10 from app9 to app12

https://gerrit.wikimedia.org/r/882226

Change 882227 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[labs/countervandalism/stillalive@master] Move CVNBots 16, 17, 19, 22, 23, 24 from app9 to app12

https://gerrit.wikimedia.org/r/882227

Change 882228 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[labs/countervandalism/stillalive@master] Move CVNBots 25, 26, 27, 28, 29 from app9 to app12

https://gerrit.wikimedia.org/r/882228

Change 882224 merged by jenkins-bot:

[labs/countervandalism/stillalive@master] Move CVNBots 1, 2, 3, 4, 5, 11, 12 from app8 to app10

https://gerrit.wikimedia.org/r/882224

Mentioned in SAL (#countervandalism) [2023-01-22T19:53:32Z] <AntiComposite> Deploy 80ea1f5 to cvn-app10 (T306066)

Change 882225 merged by jenkins-bot:

[labs/countervandalism/stillalive@master] Move CVNBots 13, 14, 15, 18, 20, 21 from app8 to app10

https://gerrit.wikimedia.org/r/882225

Mentioned in SAL (#countervandalism) [2023-01-22T20:51:04Z] <AntiComposite> Deploy 1acdb8e to cvn-app8, stopping bots (T306066)

Mentioned in SAL (#countervandalism) [2023-01-22T21:07:21Z] <AntiComposite> Deploy 1acdb8e to cvn-app10, starting bots (T306066)

Change 882226 merged by jenkins-bot:

[labs/countervandalism/stillalive@master] Move CVNBots 6, 7, 8, 9, 10 from app9 to app12

https://gerrit.wikimedia.org/r/882226

Mentioned in SAL (#countervandalism) [2023-01-23T15:59:13Z] <AntiComposite> Deploy 9024b8f to app9 (T306066)

Mentioned in SAL (#countervandalism) [2023-01-23T16:00:59Z] <AntiComposite> Deploy 9024b8f to app12 (T306066)

Change 882227 merged by jenkins-bot:

[labs/countervandalism/stillalive@master] Move CVNBots 16, 17, 19, 22, 23, 24 from app9 to app12

https://gerrit.wikimedia.org/r/882227

Mentioned in SAL (#countervandalism) [2023-01-23T16:25:51Z] <AntiComposite> Deploy 442f324 to app9 (T306066)

Mentioned in SAL (#countervandalism) [2023-01-23T16:29:14Z] <AntiComposite> Deploy 442f324 to app12 (T306066)

Change 882228 merged by jenkins-bot:

[labs/countervandalism/stillalive@master] Move CVNBots 25, 26, 27, 28, 29 from app9 to app12

https://gerrit.wikimedia.org/r/882228

Mentioned in SAL (#countervandalism) [2023-01-23T16:50:30Z] <AntiComposite> Deploy 716e140 to app9 (T306066)

Mentioned in SAL (#countervandalism) [2023-01-23T16:53:08Z] <AntiComposite> Deploy 716e140 to app12 (T306066)

Mentioned in SAL (#countervandalism) [2023-01-24T08:54:29Z] <Krinkle> Suspend cvn-app8 and cvn-app9 (pgrep -af cvn is empty on both), T306066

Mentioned in SAL (#countervandalism) [2023-01-30T22:50:12Z] <Krinkle> Delete cvn-app8 and cvn-app9 instances, ref T306066

Krinkle claimed this task.