Page MenuHomePhabricator

Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps
Closed, ResolvedPublic

Description

irc.wikimedia.org is a very old IRC service for broadcasting recent changes events from WIkimedia wikis. Bots connect to IRC channels specific to a wiki (e.g. #en.wikipedia for English wikipedia) and every edit posts a notification event in there on which bots react. The edit events are sent by MediaWiki via UDP messages as configured via the $wgRCFeeds configuration directive. The IRC service itself is implemented via a patched version of irc-ratbox and a supplemental service written in Python and python-irc.

Most bots these days are using the vastly superior Eventstreams service, but there is still ~ two dozen bots for which ownership is often unknown which use irc.wikimedia.org.

irc.wikimedia.org has no owner within Wikimedia, but has been kept alive by SRE Infrastructure Foundations over the year to keep up with OS updates, but has increasinly become more difficult. The ircecho service is still in Python 2 and moving it to Python 3 has a long tail of changes since the Python 3 version of python-ircd made extensive changes to cope with string/byte changes.

We intend to replace the combination of ratbox and ircecho with new code once written by Faidon which implements irc.wikimedia.org as a standalone service.

This new code can run on parallel VMs and eventually we can failover the irc.wikimedia.org CNAME away from the old VMs to the service provided by the new implementation.

When the new infrastructure is in place, a possible next enhancement is to move away from the UDP broadcast events and also update the code to read the event notifications from Kafka.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+0 -8
operations/mediawiki-configmaster+0 -2
operations/puppetproduction+1 -1 K
operations/alertsmaster+27 -27
operations/puppetproduction+1 -1
operations/puppetproduction+3 -5
operations/puppetproduction+11 -54
operations/dnsmaster+0 -1
operations/dnsmaster+0 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/puppetproduction+7 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -1
operations/puppetproduction+12 -0
operations/dnsmaster+1 -0
operations/puppetproduction+8 -1
operations/puppetproduction+62 -38
operations/puppetproduction+4 -0
operations/puppetproduction+22 -9
operations/puppetproduction+11 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+22 -1
operations/mediawiki-configmaster+1 -0
operations/deployment-chartsmaster+4 -0
operations/puppetproduction+18 -1
operations/puppetproduction+48 -1
operations/mediawiki-configmaster+1 -2
operations/dnsmaster+1 -0
operations/deployment-chartsmaster+4 -0
operations/puppetproduction+1 -0
operations/puppetproduction+14 -0
operations/puppetproduction+22 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2024-10-02T13:16:10Z] <moritzm> upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014

Change #1077367 merged by Muehlenhoff:

[operations/puppet@production] When enabling eventstreams install ircstream from component

https://gerrit.wikimedia.org/r/1077367

Change #1077385 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add a separate role for sse-enabled ircstream and a Hiera option

https://gerrit.wikimedia.org/r/1077385

Change #1077386 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] P:ircstream allow config to switch between UDP and SSE.

https://gerrit.wikimedia.org/r/1077386

Change #1077385 merged by Muehlenhoff:

[operations/puppet@production] Add a separate role for sse-enabled ircstream and a Hiera option

https://gerrit.wikimedia.org/r/1077385

Change #1077395 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] P:ircstream Allow enabling eventstream as a datasource.

https://gerrit.wikimedia.org/r/1077395

Change #1077397 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add basic config for irc[12]004

https://gerrit.wikimedia.org/r/1077397

Change #1077397 merged by Elukey:

[operations/puppet@production] Add basic config for irc[12]004

https://gerrit.wikimedia.org/r/1077397

Third day :)

  • irc2003 is now configured in MediaWiki and it is getting UDP traffic.
  • The new patched version of ircstream was deployed to all nodes and it seems to work correctly.
  • Upstream/Faidon created a branch for preliminary Event Stream support, that matches what Simon worked on, so we are inline with the next steps.
  • Moritz packaged the necessary dependencies to make the new branch to work on Bookworm, and uploaded them to our APT repo.
  • irc1004 was created to host the ircstream code running with EventStreams support.
  • We reached to Timo to ask if the CVNBot could be migrated to ircstream.wikimedia.org to test the new UDP-based version, before we switch all other bots.
  • Prometheus support was enabled and a dashboard was created: https://grafana.wikimedia.org/d/eb101795-c69e-4b9c-b848-f042d604f234/ircstream?orgId=1
  • Various improvements to the puppet code were made (templates, etc..).

Next steps:

  • Hopefully Timo will help us and the CVNBot will migrate to the new endpoint.
  • We'll test ircstream with ES support on irc1004
  • Plan for the production rollout of ircstream with UDP support.

Change #1077395 merged by Slyngshede:

[operations/puppet@production] P:ircstream Allow enabling eventstream as a datasource.

https://gerrit.wikimedia.org/r/1077395

Change #1077657 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] R:ircstream_sse Enable eventstream source for irc1004.

https://gerrit.wikimedia.org/r/1077657

Change #1077657 merged by Slyngshede:

[operations/puppet@production] R:ircstream_sse Enable eventstream source for irc1004.

https://gerrit.wikimedia.org/r/1077657

Change #1077724 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/dns@master] Add ircstream-sse.wikimedia.org

https://gerrit.wikimedia.org/r/1077724

Change #1077724 merged by Elukey:

[operations/dns@master] Add ircstream-sse.wikimedia.org

https://gerrit.wikimedia.org/r/1077724

Fouth's day summary:

  • Created irc2004.codfw.wmnet so we have a failover for SSE as well.
  • Created ircstream-sse.wikimedia.org pointing to irc1004, that is serving data from Eventstreams.
  • Tested a failover of ircstream.wikimedia.org to irc2003, and then failback to irc1003. Multiple clients re-connected as expected.
  • Tested as well with Python code aimed to simulate a bot.
  • Contacted the owner of https://eyeinthesky.im, they are going to test ircstream.wikimedia.org during the next couple of days and report back.

Change #1077909 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] C:ircstream add blackbox monitoring.

https://gerrit.wikimedia.org/r/1077909

Change #1077909 merged by Slyngshede:

[operations/puppet@production] C:ircstream add blackbox monitoring.

https://gerrit.wikimedia.org/r/1077909

Mentioned in SAL (#wikimedia-operations) [2024-10-04T09:35:11Z] <moritzm> upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014

Mentioned in SAL (#wikimedia-operations) [2024-10-04T10:07:30Z] <moritzm> upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014

Change #1077932 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] ircstream: No longer install python3-aiohttp-sse-client

https://gerrit.wikimedia.org/r/1077932

Change #1077932 merged by Muehlenhoff:

[operations/puppet@production] ircstream: No longer install python3-aiohttp-sse-client

https://gerrit.wikimedia.org/r/1077932

Change #1077941 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] C:ircstream move SSE hosts to internal endpoint.

https://gerrit.wikimedia.org/r/1077941

Change #1077941 merged by Slyngshede:

[operations/puppet@production] C:ircstream move SSE hosts to internal endpoint.

https://gerrit.wikimedia.org/r/1077941

Last day of hackathon (5th day):

  • irc[12]004 has been updated with a new version of ircstreams-sse that upstream released, together with updated deps etc..
  • The two SSE VMs are now using the eventstreams-internal endpoint, that should give us more stability (the external one goes through the CDN and forces clients to disconnect every 15 mins).
  • Created a list of known bots that should have a wide impact if down and that may have complex codebases to migrate to EventStreams: https://wikitech.wikimedia.org/wiki/Ircstream#Bots_still_using_the_legacy_setup. It would be nice to follow up with the Bot owners, as WMF, to ease the process of their migration otherwise we'll do another hackathon 5 years from now for the same reason :D
  • Added basic monitoring for ircstreams via blackbox-tcp (not paging, but alerting if the service is down).
  • The two bot owners that we contacted didn't come back with testing results, so we didn't send any email announcing the migration to the new stack yet. We are targeting next Thursday, ideally, but we can do it anytime (it is just a DNS change).
  • Provided upstream with one-hour long logs related to the UDP stream and the recentchanges/pagechange/revision-create streams (from ES internal). The traces (all public data) will be used by upstream to extensively test ircstream (on top of the testing that we already did).

Overall results achieved:

  • We now have two ircstreams brand new stacks: ircstream.wikimedia.org and ircstream-sse.wikimedia.org. The former still uses UDP messages from MediaWiki to work, but it is based on a way more modern and maintainable stack. The latter is still more experimental, but ideally we'll want to use it in the future.
  • Reached out to some Bot owners to get their feedback, plus tested extensively on our side.
  • With one DNS change we'll be able to move the current stack to ircstream.wikimedia.org next week. We are basically ready to go, everything done except announcement and actual DNS change. This was the main goal and it has been achieved :)

Future steps: keep improving EventStreams support in ircstreams, and hopefully move Bots away from irc as much as possible (in favor to directly contact the ES service directly).

Change #1077386 abandoned by Slyngshede:

[operations/puppet@production] P:ircstream allow config to switch between UDP and SSE.

Reason:

Template switched to EPP.

https://gerrit.wikimedia.org/r/1077386

Change #1078665 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Point irc.w.o to irc1003

https://gerrit.wikimedia.org/r/1078665

Change #1078665 merged by Muehlenhoff:

[operations/dns@master] Point irc.w.o to irc1003

https://gerrit.wikimedia.org/r/1078665

Mentioned in SAL (#wikimedia-operations) [2024-10-10T08:02:39Z] <moritzm> irc.wikimedia.org not directs to the ircstream implementation on irc1003.wikimedia.org T376014

We tried to move irc.wikimedia.org to irc1003 but we noticed some issues in messages relayed to bots, so we rolledback..

Mentioned in SAL (#wikimedia-operations) [2024-10-11T08:00:04Z] <moritzm> upload ircstream 0.13.0+wmf12u2 to apt.wikimedia.org (sync to latest git and the async_broadcast feature branch) T376014

elukey triaged this task as Medium priority.Oct 14 2024, 2:46 PM

I've been following this work from the trenches. Really great stuff!

Future steps: keep improving EventStreams support in ircstreams, and hopefully move Bots away from irc as much as possible (in favor to directly contact the ES service directly).

Modulo fixing the root causes of the rollaback; are {T240182: Create EventStream's equivalent to irc.wikimedia.org's #central channel} and {T234234: Port architecture of irc-recentchanges to Kafka} still relevant, or do we need some grooming / reboot of those tasks?

Change #1082129 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Point irc.w.o to irc1003, take two

https://gerrit.wikimedia.org/r/1082129

Change #1082129 merged by Muehlenhoff:

[operations/dns@master] Point irc.w.o to irc1003, take two

https://gerrit.wikimedia.org/r/1082129

Change #1082154 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Drop the ircstream CNAME

https://gerrit.wikimedia.org/r/1082154

Mentioned in SAL (#wikimedia-operations) [2024-10-22T08:24:41Z] <moritzm> irc.wikimedia.org has been switched to ircstream T376014

Future steps: keep improving EventStreams support in ircstreams, and hopefully move Bots away from irc as much as possible (in favor to directly contact the ES service directly).

Modulo fixing the root causes of the rollaback; are {T240182: Create EventStream's equivalent to irc.wikimedia.org's #central channel} and {T234234: Port architecture of irc-recentchanges to Kafka} still relevant, or do we need some grooming / reboot of those tasks?

Upstream (Faidon :D) fixed the root causes of the rollback, we are now serving irc.wikimedia.org via ircstream :)
Re: T240182 and T234234, I just closed them as Declined. I think that the major concerns that we had years ago have been addressed (namely critical bots moved out of the need to use #central) and we have only 3 bots connected (with no contact information and with names that seems to suggest some test left running). I would personally just announce the unavailability of #central if/when we'll decide to move to Eventstreams.

Change #1082154 merged by Muehlenhoff:

[operations/dns@master] Drop the ircstream CNAME

https://gerrit.wikimedia.org/r/1082154

if/when we'll decide to move to Eventstreams

Are you sure you want to move to EventStreams? Would consuming directly from Kafka not be better?

if/when we'll decide to move to Eventstreams

Are you sure you want to move to EventStreams? Would consuming directly from Kafka not be better?

@Ottomata Hi! Good timing since we have some questions/doubts :)

Faidon used EventStreams since it was available from "outside", but its support in ircstream is still experimental and can be changed. Nobody is likely going to use ircstreams outside the WMF to offer the same service, ideally people should use directly EventStreams. It would be nice to "drink-our-own-champagne" / "eat-our-own-dogfood" and make sure that asking everybody to move to EventStreams is indeed supported and free of issues.

After a chat with Faidon I had collected the following doubts about Eventstreams:

  1. Every 5 minutes the connection client/event-streams is dropped by the CDN, and reconnecting using the last event id's timestamp may be lossy (as opposed to using an offset).
  2. The event's data payload carries meta->offset, that Faidon is using to have a more precise resumption logic (see here and here). From https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams_HTTP_Service#Historical_Consumption it seems the right way to go, but (see next point).
  3. What happens when a MW switchover is performed? The client should be consuming data from the previously-active topic until the 5 mins timeout happens, then if not using a timestamp-based event id it will not be able to use an offset right? (Because of the change in topics and probably in the consumer group used by eventstreams etc..). Is timestamp-based-id the only solution? If so, do we have to specify that events may be lost while reconnecting?

And now Kafka :)

At the moment we have two ircstreams VMs, irc1003 and irc2003. Only one is active at the same time, but MW streams UDP traffic to both of them. If we move to Kafka, we could have both VMs pulling data (say recentchanges or page change) from both eqiad and codfw topics merging those into one stream (since only one is active/producting-data at any given time), and in theory point 3) should be theoretically better handled. But how would we handle consumer groups?

  1. Single consumer group for all bots? Simple and probably similar to what we have now with UDP.
  2. Specific consumer group for each bot? Probably more complicated, and then each bot would need to be able to signal if the event consumption should start from latest or from the last offset used (breaking the API, probably we don't want it).

reconnecting using the last event id's timestamp may be lossy

I think it shouldn't be lossy, but it will result in re-consuming messages that have already been consumed.

Everything else you wrote about EventStreams is correct. :)

how would we handle consumer groups?

What happens with the old IRC server behavior during a DC switch? I'd assume that there is no guarantee of seeing a duplicate, or not missing messages? Whenever the client is connected to the switched server, they will just see messages as they come in?

If you just want to support the existing behavior, you might not even need consumer groups. Just always start consuming from latest? If people want more guarantees they can use EventStreams :)

Or, I guess you could have a consumer group for each IRC server? On restarts you the server just starts from its latest consumed offset. Oh, this is I think what you mean by "single consumer group for all bots".

Specific consumer group for each bot?

This sounds too fancy for IRC, and would mean that you'd have a kafka consumer connections for each bot nick, meaning you'd consume the same topic data repeatedly for each bot.

reconnecting using the last event id's timestamp may be lossy

I think it shouldn't be lossy, but it will result in re-consuming messages that have already been consumed.

Everything else you wrote about EventStreams is correct. :)

Okok got it thanks! I thought that the last event timestamp wasn't super precise, and hence resuming from it may have missed some event. For example, say the last-event-timestamp value is slightly after the last-event-consumed, and when the client resumes from last-event-timestamp whatever came from the last-consumed-event and the first event recovered by Eventstreams with that timestamp is lost. Could it happen?

My doubt comes from: https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams_HTTP_Service#Historical_Consumption

When given a timestamp, EventStreams will ask Kafka for the message offset in the stream(s) that most closely match the timestamp.

how would we handle consumer groups?

What happens with the old IRC server behavior during a DC switch? I'd assume that there is no guarantee of seeing a duplicate, or not missing messages? Whenever the client is connected to the switched server, they will just see messages as they come in?

The irc.wikimedia.org domain is a CNAME to irc1003.wikimedia.org, no CDN / lvs / etc.. in the middle, and it receives traffic from both eqiad and codfw via UDP. We have the possibility of do a failover to irc2003 if needed (during maintenance for example etc..), whatever UDP message comes between the client disconnecting/reconnecting gets lost yes.

If you just want to support the existing behavior, you might not even need consumer groups. Just always start consuming from latest? If people want more guarantees they can use EventStreams :)

Yes sure but if we migrate to Kafka/ES we'd like to have something more resilient than UDP :D

Or, I guess you could have a consumer group for each IRC server? On restarts you the server just starts from its latest consumed offset. Oh, this is I think what you mean by "single consumer group for all bots".

Exactly yes, I didn't explain myself clearly. But at this point, if I need to subscribe to two topics (eqiad/codfw) for each irc server, it is more convenient to just use eventstreams internal, no?

when the client resumes from last-event-timestamp whatever came from the last-consumed-event and the first event recovered by Eventstreams with that timestamp is lost. Could it happen?

Hm, if the provided last-event-timestamp is after the last event you consumed, yes it could happen. But, why would that happen? The SSE Last-Event-ID for each event should have the each topic, partition, timestamp. The timestamp should be the event time of the last event consumed in each topic-partition.

https://gitlab.wikimedia.org/repos/data-engineering/eventstreams#historical-consumption--offsets

Also, trying to find some nice Kafka docs to explain how it uses timestamps to find offsets, but maybe the offsetsForTimes API does the best?

Look up the offsets for the given partitions by timestamp. The returned offset for each partition is the earliest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition.

So, given a timestamp, you will get the earliest offset kafka has for that timestamp.

Exactly yes, I didn't explain myself clearly. But at this point, if I need to subscribe to two topics (eqiad/codfw) for each irc server, it is more convenient to just use eventstreams internal, no?

Hm, I don't think so? You can subscribe to multiple topics with one kafka consumer. What EventStreams is getting you is automated mapping from stream name to composite topics. But that just comes from EventStreamConfig. To know the topics to read for e.g. mediawiki.recentchange:

curl -s 'https://meta.wikimedia.org/w/api.php?action=streamconfigs&streams=mediawiki.recentchange'  | jq '.streams."mediawiki.recentchange".topics'
[
  "eqiad.mediawiki.recentchange",
  "codfw.mediawiki.recentchange"
]

Mentioned in SAL (#wikimedia-operations) [2024-10-28T12:12:00Z] <moritzm> upgrade irc.wikimedia.org to ircstream 0.13.0+wmf12u3 T376014

Additional update: In the Grafana dashboard we saw a recurring pattern of client disconnects, which correlated with the CVNBot. Faidon tracked down the underlying issue to a bug in the C# library used by that bot and commited a fix (https://github.com/paravoid/ircstream/commit/719e628a31b757a72b12858f71bcda3f23776f41). Today I've updated our Debian package with that fix and the recurring client disconnects have now stopped.

Change #1083971 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Remove ircstream-ssh CNAME

https://gerrit.wikimedia.org/r/1083971

Change #1083971 merged by Muehlenhoff:

[operations/dns@master] Remove ircstream-ssh CNAME

https://gerrit.wikimedia.org/r/1083971

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: irc1004.wikimedia.org

  • irc1004.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Change #1084026 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove support for using ircstream with eventstream

https://gerrit.wikimedia.org/r/1084026

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: irc2004.wikimedia.org

  • irc2004.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:51:08Z] <moritzm> uploaded ircstream 1.0+wmf12u1 to apt.wikimedia.org T376014

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:55:17Z] <moritzm> upgrade irc.wikimedia.org to ircstream 1.0+wmf12u1 T376014

Change #1084026 merged by Muehlenhoff:

[operations/puppet@production] Remove support for using ircstream with eventstream

https://gerrit.wikimedia.org/r/1084026

Change #1084048 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::prometheus::ops: remove event-related config for ircstream

https://gerrit.wikimedia.org/r/1084048

Change #1084048 merged by Elukey:

[operations/puppet@production] profile::prometheus::ops: remove event-related config for ircstream

https://gerrit.wikimedia.org/r/1084048

MoritzMuehlenhoff claimed this task.

irc.wikimedia.org is powered by ircstream 1.0 with no known bugs, marking this as resolved. The old VMs will be removed in two weeks.

Change #1088482 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove mw_rc_irc role from irc1002/2002 for decom of the legacy service

https://gerrit.wikimedia.org/r/1088482

Change #1088482 merged by Muehlenhoff:

[operations/puppet@production] Remove mw_rc_irc role from irc1002/2002 for decom of the legacy service

https://gerrit.wikimedia.org/r/1088482

Please note I have filed the task T378406 - summary: channels keep closing a lot more often, and when they do so, they are not always restarting with all the others.

Change #1089652 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove Puppet code for legacy udpmixecho/ircecho setup

https://gerrit.wikimedia.org/r/1089652

Change #1089714 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/alerts@master] team-sre: move irc-echo alerts to ircstream

https://gerrit.wikimedia.org/r/1089714

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: irc1002.wikimedia.org

  • irc1002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Change #1089751 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/deployment-charts@master] deployment-charts: Remove irc1002/irc2002

https://gerrit.wikimedia.org/r/1089751

Change #1089752 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/mediawiki-config@master] Remove irc1002/irc2002 from wmf-config

https://gerrit.wikimedia.org/r/1089752

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: irc2002.wikimedia.org

  • irc2002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Change #1089714 merged by Elukey:

[operations/alerts@master] team-sre: move irc-echo alerts to ircstream

https://gerrit.wikimedia.org/r/1089714

Change #1089652 merged by Muehlenhoff:

[operations/puppet@production] Remove Puppet code for legacy udpmixecho/ircecho setup

https://gerrit.wikimedia.org/r/1089652

Change #1089752 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove irc1002/irc2002 from wmf-config

https://gerrit.wikimedia.org/r/1089752

Change #1089751 merged by jenkins-bot:

[operations/deployment-charts@master] deployment-charts: Remove irc1002/irc2002

https://gerrit.wikimedia.org/r/1089751

Final status update: The VMs with the legacy setup have been removed and the obsolete Puppet code removed.