Page MenuHomePhabricator

Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm
Closed, DeclinedPublic

Description

Debian Buster (10) is soon EOL so Wikispore servers will need to be replaced with ones based on Bullseye (11) or (preferably) Bookworm (12). See: https://wikitech.wikimedia.org/wiki/News/Buster_deprecation

Requires T319167: [EPIC] Upgrade MediaWiki-Vagrant to Debian Bullseye or T365935: [EPIC] Upgrade MediaWiki-Vagrant to Debian Bookworm.

Previous migration: T321968: Rebuild Wikispore Vagrant box on Buster

Event Timeline

Tgr renamed this task from Update Wikispore to use Bookworm to Update Wikispore to use Bullseye or Bookworm.May 26 2024, 9:42 AM
Tgr updated the task description. (Show Details)
Tgr renamed this task from Update Wikispore to use Bullseye or Bookworm to Rebuild Wikispore Vagrant box on Bullseye or Bookworm.May 26 2024, 9:57 AM
Tgr claimed this task.
Tgr triaged this task as High priority.
Tgr renamed this task from Rebuild Wikispore Vagrant box on Bullseye or Bookworm to Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm.May 26 2024, 9:59 AM
Tgr updated the task description. (Show Details)
Tgr moved this task from Backlog to Next-up on the Wikispore board.

This will be harder than I thought as there is no Vagrant base box for Bullseye + amd64 + LXC. We'll either have to build our own per https://github.com/fgrehm/vagrant-lxc/blob/master/BOXES.md (T322450: Build MediaWiki-Vagrant LXC Buster base image, except for Bullseye - there are some extremely old tutorials here and here) or finally migrate off Vagrant (T322991: Consider another orchestration system for Wikispore).

A possible approach to meet the Jul 15 deadline is to build new Bullseye Cloud VPS boxes, and copy the existing Vagrant boxes over as they are. (Vagrant doesn't really care about the host OS, it's just that Buster-based Vagrant boxes cannot successfully provision anymore, so they are basically frozen in their current state.) I have never tried to port Vagrant boxes between Linux machines, but in theory they have export/import functionality for that so I think in theory it would be straightforward.

I'll try this over the weekend.

Thank you for paying attention to this, @Tgr. Do you still hope to work on this transfer?

I would still like to do this if possible but had a series of distractions. Sorry for the delay.

It's a new week and a new nag! @Tgr if you would like to prevent these VMs from being shut down please upgrade them soon. Thanks!

I'll try to get it done during the hackathon.

Packaging the vagrant box (via vagrant package) for export fails with

tgr@wikispore-test:/srv/mediawiki-vagrant$ vagrant package
==> default: Compressing container's rootfs...
Traceback (most recent call last):
...
/srv/vagrant-data/gems/2.5.5/gems/vagrant-lxc-1.4.3/lib/vagrant-lxc/driver.rb:53:in `rootfs_path': undefined method `[]' for nil:NilClass (NoMethodError)

This is apparently a known bug: https://github.com/fgrehm/vagrant-lxc/issues/475. As suggested there, replacing

config_entry = config_string.match(/^lxc\.rootfs\s+=\s+(.+)$/)[1]

with

config_entry = config_string.match(/^lxc\.rootfs.path\s+=\s+(.+)$/)[1]

in line 53 of driver.rb helps.


Next, vagrant package fails with a permission error. As a temporary workaround, adding

mwvagrant ALL=(root) NOPASSWD: /usr/bin/env tar --numeric-owner -cvzf /*/rootfs.tar.gz -C /var/lib/lxc/* ./rootfs
mwvagrant ALL=(root) NOPASSWD: /usr/bin/env chown * /tmp/*/rootfs.tar.gz

to the sudoers file helps.


Next, tar fails with /var/lib/lxc/mediawiki-vagrant_default_1667108589669_4706: Cannot open. Apparently it is only readable to root. chmod -R a+rX helps.

After that, I have a packaged box. Vagrant doesn't have an unpackage command, but I think if I just install it with vagrant box add, manually copy all the files and update the Vagrantfile to use the packaged file as a base box, that will work. Didn't have time to try that yet, though.

(In hindsight maybe it would have been cleaner to export the LXC container than to export the vagrant box? I'm not familiar with LXC though, and also not sure how to go from "vagrant checkout with no VM" to "vagrant checkout using the imported VM" - might require messing with Vagrant's own state management.)

Hello @Tgr, were you able to make any progress with this? I'm going on sabbatical soon and (for arcane backend reasons, T364457) need to clean up the host these are on before then.

@Tgr please work on this in the next few days. I would not like to leave this mess behind for my colleagues and we are multiple months past the deadline.

I have shut down those two VMs due to lack of response

A deadline would have been nice :( As I said in an email, I can make a DB snapshot and shut down the VMs whenever they become a problem, but I'd have to know when that is.

A deadline would have been nice

I don't know what this means -- there's plenty of history in this ticket about the deadline (July 15) and me trying to reach you for updates over the last several weeks. The initial announcement about the July deadline was sent in March.

The VMs still exist, they're just turned off. You can turn them back on if you need to copy down data from them. They remain in the way of cloudvirt upgrades, as they have been since July.

It seems like the lack of Vagrant skills in the Wikispore technical community was part of the issue here.

I'm going to boot up wikispore-prod again now to get the data off, then will shut it off again (but not delete it yet). I'll delete wikispore-test to make enough quote to start up a new instance, and get things set up there. As discussed on Telegram, it'll be without Vagrant this time, and using a Toolforge database.

I started a new repo for the new server's config: https://gitlab.wikimedia.org/toolforge-repos/wikispore-config

Still a work in progress. I've shut off the old server again.

The site is back online, but in read only mode and is still missing a few extensions etc.

(that would be T322991)

Ah, thanks! I think this task should be declined then (as we're not going to continue with Vagrant) and I'll carry on on the other.

Thanks @Samwilson!

Since we wanted to migrate away from Vagrant eventually anyway, there is no point in trying to get it working on Bullseye then.

The initial announcement about the July deadline was sent in March.

Yeah but that wasn't really a deadline. I think the preferable approach would have been to say someting like "Buster VMs will be shut down on Sept 1" and then people can figure out what actions to take if they cannot upgrade by then.
Anyway, not a problem as long as the data is still recoverable, it just wasn't clear to me that it was.