irc.wikimedia.org (argon) still runs on Ubuntu Precise. Migrate to Debian Jessie. It uses a custom build of ircd-ratbox, which will need to be rebuild for the jessie.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Dzahn | T123525 reduce amount of remaining Ubuntu 12.04 (precise) systems in production | |||
Resolved | Dzahn | T134223 decom argon | |||
Resolved | Legoktm | T128592 Add redundancy to IRC recent changes service | |||
Resolved | Dzahn | T123729 Migrate irc.wikimedia.org to Jessie | |||
Resolved | MoritzMuehlenhoff | T132427 Build ircd-ratbox for jessie | |||
Resolved | MoritzMuehlenhoff | T133101 build python-irclib for jessie | |||
Resolved | Dzahn | T122933 Remove the "HTTPS to HTTP" url filter in the IRC feed | |||
Resolved | Dzahn | T105422 enable IPv6 on irc.wikimedia.org | |||
Declined | None | T105804 schedule maintenance for IRC server |
Event Timeline
Downtime on this system is rather problematic for anti-vandalism because it hosts the IRC RC feed. Could probably be replaced (temporarily?) by a VM - MW has been able to send that data to multiple systems for a couple of years now
a VM seems like a good use case for this, we have VMs with public ip addresses already (e.g. lists) so it could be permanent
also note the udp echo bot doesn't seem to have been restarted in a while, still posts as rc-pmtpa
argon:~$ grep rc- /etc/init/udpmxircecho exec /usr/local/bin/udpmxircecho.py rc-eqiad argon.wikimedia.org argon:~$ ps fwaux | grep rc- irc 8210 0.2 0.1 118984 9784 ? Ssl 2015 1617:03 python /usr/local/bin/udpmxircecho.py rc-pmtpa localhost
ah, thanks @Krenair , also I was mistaken, the version in puppet is the correct one despite having two on argon
/etc/init/udpmxircecho:exec /usr/local/bin/udpmxircecho.py rc-eqiad argon.wikimedia.org /etc/init/ircecho.conf:exec /usr/local/bin/udpmxircecho.py rc-pmtpa localhost
Change 282997 had a related patch set uploaded (by Dzahn):
introduce kraz.codfw.wmnet
Change 283064 had a related patch set uploaded (by Dzahn):
site/install_server: add kraz.codfw.wmnet
next we need systemd unit files for ircd and ircecho:
Error: /Stage[main]/Mw_rc_irc::Ircserver/Service[ircd]: Provider upstart is not functional on this host
Error: /Stage[main]/Mw_rc_irc::Irc_echo/Service[ircecho]: Provider upstart is not functional on this host
Change 284115 had a related patch set uploaded (by Dzahn):
kraz.codfw.wmnet -> kraz.wm.org, needs public IP
Change 284116 had a related patch set uploaded (by Dzahn):
kraz.codfw.wmnet -> kraz.wikimedia.org
Proposed migration plan after discussing with @Dzahn and @ori on IRC:
- Set up kraz (Jessie; VM) to be a replacement for argon (Precise; metal).
- Update MediaWiki wmf-config to broadcast events to both.
- Verify that it works as intended (manually connect to kraz with IRC and verify e.g. /join #en.wikipedia and look for events. /join #test.wikipedia and verify making an edit on test.wikipedia.org results in it showing up.
- Update DNS for irc.wikimedia.org to point to kraz. (then X=$(date), assert X < May 2nd)
- On May 2nd, argon will be shut down. If and when it comes back up after the Jessie upgrade, it'll be without the MW-IRC service.
Starting on date X, DNS caches slowly roll over and new connections will use kraz. Existing sessions on argon and clients that hardcoded the argon IP won't be unaffected yet.
Reminder: Announce the service change on Tech News and wikitech-l.
Draft notes:
- irc.wikimedia.org will be migrated to a new host internally. The final part of this migration will happen on May 2nd. No action is required if your bot automatically reconnects. To avoid a forced reconnect on May 2nd, manually restart your client any time between date X and May 2nd. New connections after date X will remain uninterrupted on May 2nd. Bot owners should ensure no IP addresses are hardcoded (see T123729 for details.)
Details:
- If you hardcode IP addresses anywhere, be sure to update them between date X and May 2nd.
- On date X, the IP address of irc.wikimedia.org will change to point to kraz. At this point, new sessions will start on kraz. IRC sessions on argon will also continue to work.
- On May 2nd, argon will be shutdown and the old IP will stop working.
Note that I don't think you will be able to join the channel for a given wiki until after the first edit on that wiki since the MW config change. More likely to be an issue for testwiki than enwiki :)
Change 284259 had a related patch set uploaded (by Dzahn):
install: update MAC address of kraz
Change 284273 had a related patch set uploaded (by Dzahn):
ircserver/irc_echo: use systemd provider if on jessie
Change 284293 had a related patch set uploaded (by Dzahn):
ircserver: add systemd unit file and conditionals
Change 284343 had a related patch set uploaded (by Dzahn):
ircserver: fix dependencies for running on jessie
The IRCd service could be starting on jessie now, the unit file is there, the dependencies are adjusted if on jessie, but the next problem is that the package python-irclib exists on precise but not on jessie, apparently.
E: Package 'python-irclib' has no installation candidate
now:
[kraz:~] $ dpkg -l | grep python-irc
ii python-irc 8.5.3+dfsg-2 all Internet Relay Chat (IRC) protocol client library for Python
next up:
rror: Could not set 'file' on ensure: No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock at 19:/etc/puppet/modules/mw_rc_irc/manifests/ircserver.pp
Wrapped exception:
No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock
Error: /Stage[main]/Mw_rc_irc::Ircserver/File[/usr/etc/ircd.conf]/ensure: change from absent to file failed: Could not set 'file' on ensure: No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock at 19:/etc/puppet/modules/mw_rc_irc/manifests/ircserver.pp
Change 285561 had a related patch set uploaded (by Dzahn):
ircecho: make it start on systemd, add unit file
Change 285568 had a related patch set uploaded (by Dzahn):
ircecho: fix init file dependency for service on systemd
Change 285568 merged by Dzahn:
ircecho: fix init file dependency for service on systemd
now:
service ircecho status ● ircecho.service - IRC bot for the MW RC IRCD Loaded: loaded (/etc/systemd/system/ircecho.service; disabled) Active: active (running)
next:
/etc/systemd/system# service ircd status ● ircd.service - IRCd for Mediawiki RecentChanges feed Loaded: loaded (/etc/systemd/system/ircd.service; disabled) Active: failed (Result: exit-code)
Change 285569 had a related patch set uploaded (by Dzahn):
ircserver: puppetize install of ircd-ratbox
< icinga-wm> RECOVERY - puppet last run on kraz is OK: OK:
● ircd.service - IRCd for Mediawiki RecentChanges feed
Loaded: loaded (/etc/systemd/system/ircd.service; disabled) Active: active (running)
01:43 -!- Irssi: Looking up localhost
01:43 -!- Irssi: Connecting to localhost [127.0.0.1] port 6667
01:43 -!- Irssi: Connection to localhost established
01:43 !localhost * Processing connection to irc.pmtpa.wikimedia.org
....
01:43 !localhost * Found your hostname
01:43 !irc.pmtpa.wikimedia.org *** Spoofing your IP. congrats.
Change 285570 had a related patch set uploaded (by Dzahn):
ircserver: add irssi on irc server for testing
The bot connects to the IRC server but does not join any channels because it does not get input on port 9390 from the appservers.
compare to root@argon:~# tcpdump port 9390
if we could get some of this over to kraz to confirm ?
Change 286509 had a related patch set uploaded (by Dzahn):
switch irc.wm.org from argon to kraz
Change 286544 had a related patch set uploaded (by Dzahn):
udpmxircecho: remove newlines from RC data
Change 286546 had a related patch set uploaded (by Dzahn):
udpmxircecho: fix utf-8 encoding issue
Mentioned in SAL [2016-05-03T01:40:01Z] <mutante> irc.wm.org - see T123729 if any questions
18:44 < mutante> !log switching irc.wikimedia.org from old server argon to new server kraz. old server still running untouched as argon.wikimedia.org. no clients are kicked. appservers are sending RC to both.
done
- Update MediaWiki wmf-config to broadcast events to both.
done
- Verify that it works as intended (manually connect to kraz with IRC and verify e.g. /join #en.wikipedia and look for events. /join #test.wikipedia and verify making an edit on test.wikipedia.org results in it showing up.
done
- Update DNS for irc.wikimedia.org to point to kraz. (then X=$(date), assert X < May 2nd)
done
- On May 2nd, argon will be shut down. If and when it comes back up after the Jessie upgrade, it'll be without the MW-IRC service.
not done, argon is still up and reachable as of right now, just in case
Starting on date X, DNS caches slowly roll over and new connections will use kraz. Existing sessions on argon and clients that hardcoded the argon IP won't be unaffected yet.
yes, from now
Reminder: Announce the service change on Tech News and wikitech-l.
Done on wikitech-l, was on Tech News by Johan (the URL changing part at least)
Draft notes:
used some of these in the wikitech-l mail. thanks