Upgrade eventlogging VM to bullseye (or bookworm)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	BTullis
	Oct 19 2023, 11:35 AM

Description

Reference ticket for the buster upgrade: T278137: Migrate eventlog1002 to buster

We currently run our legacy eventlogging on a single VM:

eventlog1003.eqiad.wmnet

It runs the following eventlogging-processor services:

btullis@eventlog1003:~$ pstree -aT eventlogging
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-07
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-01
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-09
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-05
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-04
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-11
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-10
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-08
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-02
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-03
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-00
python3 /srv/deployment/eventlogging/analytics/bin/eventlogging-processor @/etc/eventlogging.d/processors/client-side-06

However, the virtual machine is otherwise stateless.
All state is now stored in Kafka.

As per T278137, the recommended approach last time we need to upgrade was to create a parallel VM running the next O/S.
We then ran the two systems in parallel until we were confident enough that we could turn off the older version.

We may have to do some work on the eventlogging code to make sure that it works in the system python.

There is perhaps an argument here for skipping bullseye and moving straight to bookworm.

Tagging Event-Platform and Data-Engineering for visibility and in case they might be need to help update the code, but I believe that Data-Platform-SRE will provision the new VM and migrate the service when tested.

Details

	Subject	Repo	Branch	Lines +/-
	eventlogging: tweak PYTHONPATH to allow eventlogging to import _mysql.so	operations/puppet	production	+6 -6

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T291916 Tracking task for Bullseye migrations in production
Open	None	T288804 Upgrade the Data Engineering infrastructure to Debian Bullseye
Resolved	brouberol	T349289 Upgrade eventlogging VM to bullseye (or bookworm)

Event Timeline

BTullis created this task.Oct 19 2023, 11:35 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 19 2023, 11:35 AM

Pretty sure old eventlogging is python 2

edit: Oh! nevermind! I guess not, I see python3 in your ps output now

BTullis added a parent task: T288804: Upgrade the Data Engineering infrastructure to Debian Bullseye.Oct 23 2023, 8:40 AM

Gehel moved this task from Incoming to Ready for Work on the Data-Platform-SRE board.Nov 3 2023, 10:34 AM

Gehel triaged this task as High priority.Nov 15 2023, 9:44 AM

Gehel moved this task from Ready for Work to OS Upgrade on the Data-Platform-SRE board.Dec 6 2023, 1:11 PM

I believe that we're on the verge of finishing the migration of all legacy eventlogging componenets.
See T259163: Migrate legacy metawiki schemas to Event Platform and T238230: Decommission EventLogging backend components by migrating to MEP for further details on that effort.

Therefore, I think it likely that we will be able to decommission the eventlogging1003 VM instead of upgrading it to bullseye/bookworm.

Decommissioning probably won't get done until after I'm back from leave in late April. Can we wait that long?

In T349289#9441870, @Ottomata wrote:

Decommissioning probably won't get done until after I'm back from leave in late April. Can we wait that long?

OK, that's quite a long time then. Maybe we will upgrade it in the meantime.

lbowmaker moved this task from Incoming (new tickets) to Radar (External Teams) on the Data-Engineering board.Feb 8 2024, 7:06 PM

brouberol claimed this task.Feb 14 2024, 1:22 PM

brouberol edited projects, added Data-Platform-SRE (2024.02.12 - 2024.03.03); removed Data-Platform-SRE.

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host eventlog1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host eventlog1003.eqiad.wmnet with OS bullseye completed:

eventlog1003 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402141339_brouberol_4116050_eventlog1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB

Change 1003438 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] eventlogging: tweak PYTHONPATH to allow eventlogging to import _mysql.so

https://gerrit.wikimedia.org/r/1003438

gerritbot added a project: Patch-For-Review.Feb 14 2024, 2:29 PM

Change 1003438 merged by Brouberol:

[operations/puppet@production] eventlogging: tweak PYTHONPATH to allow eventlogging to import _mysql.so

https://gerrit.wikimedia.org/r/1003438

brouberol closed this task as Resolved.Feb 14 2024, 2:42 PM

Maintenance_bot removed a project: Patch-For-Review.Feb 14 2024, 3:32 PM

Gehel moved this task from Backlog to Done on the Data-Platform-SRE (2024.02.12 - 2024.03.03) board.Feb 15 2024, 2:05 PM

Upgrade eventlogging VM to bullseye (or bookworm)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Upgrade eventlogging VM to bullseye (or bookworm)
Closed, ResolvedPublic
Actions

Related Objects
Search...