T238230: Decommission EventLogging backend components by migrating to MEP will not happen by the end of Q4, so we should upgrade the server (currently eventlog1002) running the backend eventlogging-processor to Buster. Many streams have already been migrated to EventGate, so we should be able to spin up a new Buster Ganeti VM and run eventlogging-processor there, and then decom eventlog1002.
Just to clarify - should eventlog1002 be upgraded to buster and then decommissioned as part of this task or is decommissioning work part of another task? Should the new eventlog VM (eventlog1003 I guess :)) be kept in place for the foreseeable future until we decide to decommission all eventlogging components?
@hnowlan in theory this could be the perfect scenario:
- We create eventlog1003 on Ganeti (sizing the VM appropriately) using Buster and Python 3.7 (shipped with it), and we run it in parallel with eventlog1002.
- After a little time running both, when we are confident that no corner cases need to be fixed with Python 3.7, we stop eventlogging on 1002.
- We decom eventlog1002 and return it to DCops (without any upgrade).
On paper Eventlogging is a stateless app now, since it works only on Kafka topics, so in theory we could even think about Kubernetes. In practice it is surely quicker to spin up a VM and re-use the current puppet machinery, to avoid investing too much time on something that we hope to deprecate asap in favor of eventgate-analytics.
Lemme know your thoughts :)
Sounds good to me! I don't think there's much point in exploring Kubernetes as opposed to using a VM if our medium-term plan is to get rid of the system altogether.
Based on the graphs it looks like we might be okay with a VM with 4GB of memory, maybe 4 vcpus to start with? We can tune CPUs upwards if needs be if it's looking overloaded. Doesn't seem like it'll need a lot of disk.
What is the current status of eventlog1003?
It's reported by a cumin check that ensures that all hosts matching the alias A:all are part of one of the datacenters, and eventlog1003 is not part of the alias for A:eqiad.
AFAICT it's in PuppetDB but not assigned to any role in site.pp. See also https://puppetboard.wikimedia.org/node/eventlog1003.eqiad.wmnet