Page MenuHomePhabricator

Migrate irc.wikimedia.org to Jessie
Closed, ResolvedPublic

Description

irc.wikimedia.org (argon) still runs on Ubuntu Precise. Migrate to Debian Jessie. It uses a custom build of ircd-ratbox, which will need to be rebuild for the jessie.

Event Timeline

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff added a project: Operations.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 15 2016, 12:41 PM

Downtime on this system is rather problematic for anti-vandalism because it hosts the IRC RC feed. Could probably be replaced (temporarily?) by a VM - MW has been able to send that data to multiple systems for a couple of years now

a VM seems like a good use case for this, we have VMs with public ip addresses already (e.g. lists) so it could be permanent
also note the udp echo bot doesn't seem to have been restarted in a while, still posts as rc-pmtpa

argon:~$ grep rc- /etc/init/udpmxircecho
exec /usr/local/bin/udpmxircecho.py rc-eqiad argon.wikimedia.org
argon:~$ ps fwaux | grep rc-
irc       8210  0.2  0.1 118984  9784 ?        Ssl   2015 1617:03 python /usr/local/bin/udpmxircecho.py rc-pmtpa localhost

it still posts as that because changing it might be a breaking change for some bots

ah, thanks @Krenair , also I was mistaken, the version in puppet is the correct one despite having two on argon

/etc/init/udpmxircecho:exec /usr/local/bin/udpmxircecho.py rc-eqiad argon.wikimedia.org
/etc/init/ircecho.conf:exec /usr/local/bin/udpmxircecho.py rc-pmtpa localhost

a VM seems like a good use case for this, we have VMs with public ip addresses already (e.g. lists) so it could be permanent

agree, let's move ahead by creating a VM for this

Change 282997 had a related patch set uploaded (by Dzahn):
introduce kraz.codfw.wmnet

https://gerrit.wikimedia.org/r/282997

Change 282997 merged by Dzahn:
introduce kraz.codfw.wmnet

https://gerrit.wikimedia.org/r/282997

Change 283064 had a related patch set uploaded (by Dzahn):
site/install_server: add kraz.codfw.wmnet

https://gerrit.wikimedia.org/r/283064

Change 283064 merged by Dzahn:
site/install_server: add kraz.codfw.wmnet

https://gerrit.wikimedia.org/r/283064

installed kraz.codfw.wmnet - added to puppet, salt, icinga, added mw-rc role

Dzahn added a comment.EditedApr 19 2016, 12:47 AM

next we need systemd unit files for ircd and ircecho:

Error: /Stage[main]/Mw_rc_irc::Ircserver/Service[ircd]: Provider upstart is not functional on this host
Error: /Stage[main]/Mw_rc_irc::Irc_echo/Service[ircecho]: Provider upstart is not functional on this host

eh, and this needs a public IP, unlike antimony

Change 284115 had a related patch set uploaded (by Dzahn):
kraz.codfw.wmnet -> kraz.wm.org, needs public IP

https://gerrit.wikimedia.org/r/284115

Change 284116 had a related patch set uploaded (by Dzahn):
kraz.codfw.wmnet -> kraz.wikimedia.org

https://gerrit.wikimedia.org/r/284116

Dzahn removed Dzahn as the assignee of this task.Apr 19 2016, 1:41 AM

Proposed migration plan after discussing with @Dzahn and @ori on IRC:

  • Set up kraz (Jessie; VM) to be a replacement for argon (Precise; metal).
  • Update MediaWiki wmf-config to broadcast events to both.
  • Verify that it works as intended (manually connect to kraz with IRC and verify e.g. /join #en.wikipedia and look for events. /join #test.wikipedia and verify making an edit on test.wikipedia.org results in it showing up.
  • Update DNS for irc.wikimedia.org to point to kraz. (then X=$(date), assert X < May 2nd)
  • On May 2nd, argon will be shut down. If and when it comes back up after the Jessie upgrade, it'll be without the MW-IRC service.

Starting on date X, DNS caches slowly roll over and new connections will use kraz. Existing sessions on argon and clients that hardcoded the argon IP won't be unaffected yet.

Reminder: Announce the service change on Tech News and wikitech-l.

Draft notes:

  • irc.wikimedia.org will be migrated to a new host internally. The final part of this migration will happen on May 2nd. No action is required if your bot automatically reconnects. To avoid a forced reconnect on May 2nd, manually restart your client any time between date X and May 2nd. New connections after date X will remain uninterrupted on May 2nd. Bot owners should ensure no IP addresses are hardcoded (see T123729 for details.)

Details:

  • If you hardcode IP addresses anywhere, be sure to update them between date X and May 2nd.
  • On date X, the IP address of irc.wikimedia.org will change to point to kraz. At this point, new sessions will start on kraz. IRC sessions on argon will also continue to work.
  • On May 2nd, argon will be shutdown and the old IP will stop working.
Krinkle renamed this task from Migrate argon to jessie to Migrate argon (irc.wikimedia.org) to Jessie.Apr 19 2016, 2:43 AM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Krinkle added a project: Developer-notice.
Krinkle added a project: Notice.

manually connect to kraz with IRC and verify e.g. /join #en.wikipedia and look for events. /join #test.wikipedia and verify making an edit on test.wikipedia.org results in it showing up.

Note that I don't think you will be able to join the channel for a given wiki until after the first edit on that wiki since the MW config change. More likely to be an issue for testwiki than enwiki :)

Change 284116 merged by Dzahn:
kraz.codfw.wmnet -> kraz.wikimedia.org

https://gerrit.wikimedia.org/r/284116

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 6:17 PM

Change 284115 merged by Dzahn:
kraz.codfw.wmnet -> kraz.wm.org, needs public IP

https://gerrit.wikimedia.org/r/284115

Change 284259 had a related patch set uploaded (by Dzahn):
install: update MAC address of kraz

https://gerrit.wikimedia.org/r/284259

Change 284259 merged by Dzahn:
install: update MAC address of kraz

https://gerrit.wikimedia.org/r/284259

reinstalled with public IP as kraz.wikimedia.org, in puppet and up and running.

Change 284273 had a related patch set uploaded (by Dzahn):
ircserver/irc_echo: use systemd provider if on jessie

https://gerrit.wikimedia.org/r/284273

Change 284273 merged by Dzahn:
ircserver/irc_echo: use systemd provider if on jessie

https://gerrit.wikimedia.org/r/284273

Change 284293 had a related patch set uploaded (by Dzahn):
ircserver: add systemd unit file and conditionals

https://gerrit.wikimedia.org/r/284293

Change 284293 merged by Dzahn:
ircserver: add systemd unit file and conditionals

https://gerrit.wikimedia.org/r/284293

Change 284343 had a related patch set uploaded (by Dzahn):
ircserver: fix dependencies for running on jessie

https://gerrit.wikimedia.org/r/284343

Change 284343 merged by Dzahn:
ircserver: fix dependencies for running on jessie

https://gerrit.wikimedia.org/r/284343

The IRCd service could be starting on jessie now, the unit file is there, the dependencies are adjusted if on jessie, but the next problem is that the package python-irclib exists on precise but not on jessie, apparently.

E: Package 'python-irclib' has no installation candidate

akosiaris triaged this task as Normal priority.Apr 20 2016, 11:13 AM

now:

[kraz:~] $ dpkg -l | grep python-irc
ii python-irc 8.5.3+dfsg-2 all Internet Relay Chat (IRC) protocol client library for Python

next up:

rror: Could not set 'file' on ensure: No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock at 19:/etc/puppet/modules/mw_rc_irc/manifests/ircserver.pp
Wrapped exception:
No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock
Error: /Stage[main]/Mw_rc_irc::Ircserver/File[/usr/etc/ircd.conf]/ensure: change from absent to file failed: Could not set 'file' on ensure: No such file or directory @ dir_s_rmdir - /usr/etc/ircd.conf20160421-7253-53vhh.lock at 19:/etc/puppet/modules/mw_rc_irc/manifests/ircserver.pp

Change 285561 had a related patch set uploaded (by Dzahn):
ircecho: make it start on systemd, add unit file

https://gerrit.wikimedia.org/r/285561

Change 285561 merged by Dzahn:
ircecho: make it start on systemd, add unit file

https://gerrit.wikimedia.org/r/285561

Change 285568 had a related patch set uploaded (by Dzahn):
ircecho: fix init file dependency for service on systemd

https://gerrit.wikimedia.org/r/285568

Change 285568 merged by Dzahn:
ircecho: fix init file dependency for service on systemd

https://gerrit.wikimedia.org/r/285568

now:

service ircecho status
● ircecho.service - IRC bot for the MW RC IRCD
   Loaded: loaded (/etc/systemd/system/ircecho.service; disabled)
   Active: active (running)

next:

/etc/systemd/system# service ircd status
● ircd.service - IRCd for Mediawiki RecentChanges feed
   Loaded: loaded (/etc/systemd/system/ircd.service; disabled)
   Active: failed (Result: exit-code)

Change 285569 had a related patch set uploaded (by Dzahn):
ircserver: puppetize install of ircd-ratbox

https://gerrit.wikimedia.org/r/285569

Change 285569 merged by Dzahn:
ircserver: puppetize install of ircd-ratbox

https://gerrit.wikimedia.org/r/285569

< icinga-wm> RECOVERY - puppet last run on kraz is OK: OK:

● ircd.service - IRCd for Mediawiki RecentChanges feed

Loaded: loaded (/etc/systemd/system/ircd.service; disabled)
Active: active (running)

01:43 -!- Irssi: Looking up localhost
01:43 -!- Irssi: Connecting to localhost [127.0.0.1] port 6667
01:43 -!- Irssi: Connection to localhost established
01:43 !localhost * Processing connection to irc.pmtpa.wikimedia.org
....
01:43 !localhost
* Found your hostname
01:43 !irc.pmtpa.wikimedia.org *** Spoofing your IP. congrats.

Change 285570 had a related patch set uploaded (by Dzahn):
ircserver: add irssi on irc server for testing

https://gerrit.wikimedia.org/r/285570

Change 285570 merged by Dzahn:
ircserver: add irssi on irc server for testing

https://gerrit.wikimedia.org/r/285570

The bot connects to the IRC server but does not join any channels because it does not get input on port 9390 from the appservers.

compare to root@argon:~# tcpdump port 9390

if we could get some of this over to kraz to confirm ?

Change 286509 had a related patch set uploaded (by Dzahn):
switch irc.wm.org from argon to kraz

https://gerrit.wikimedia.org/r/286509

Change 286544 had a related patch set uploaded (by Dzahn):
udpmxircecho: remove newlines from RC data

https://gerrit.wikimedia.org/r/286544

Change 286546 had a related patch set uploaded (by Dzahn):
udpmxircecho: fix utf-8 encoding issue

https://gerrit.wikimedia.org/r/286546

Change 286544 merged by Dzahn:
udpmxircecho: remove newlines from RC data

https://gerrit.wikimedia.org/r/286544

Change 286546 merged by Dzahn:
udpmxircecho: fix utf-8 encoding issue

https://gerrit.wikimedia.org/r/286546

Dzahn renamed this task from Migrate argon (irc.wikimedia.org) to Jessie to Migrate irc.wikimedia.org to Jessie.May 3 2016, 1:31 AM
Dzahn mentioned this in T134223: decom argon.
Dzahn added a parent task: T134223: decom argon.

Mentioned in SAL [2016-05-03T01:40:01Z] <mutante> irc.wm.org - see T123729 if any questions

Change 286509 merged by Dzahn:
switch irc.wm.org from argon to kraz

https://gerrit.wikimedia.org/r/286509

Dzahn claimed this task.May 3 2016, 1:55 AM
Dzahn removed a project: Patch-For-Review.
Dzahn closed this task as Resolved.

18:44 < mutante> !log switching irc.wikimedia.org from old server argon to new server kraz. old server still running untouched as argon.wikimedia.org. no clients are kicked. appservers are sending RC to both.

  • Set up kraz (Jessie; VM) to be a replacement for argon (Precise; metal).

done

  • Update MediaWiki wmf-config to broadcast events to both.

done

  • Verify that it works as intended (manually connect to kraz with IRC and verify e.g. /join #en.wikipedia and look for events. /join #test.wikipedia and verify making an edit on test.wikipedia.org results in it showing up.

done

  • Update DNS for irc.wikimedia.org to point to kraz. (then X=$(date), assert X < May 2nd)

done

  • On May 2nd, argon will be shut down. If and when it comes back up after the Jessie upgrade, it'll be without the MW-IRC service.

not done, argon is still up and reachable as of right now, just in case

Starting on date X, DNS caches slowly roll over and new connections will use kraz. Existing sessions on argon and clients that hardcoded the argon IP won't be unaffected yet.

yes, from now

Reminder: Announce the service change on Tech News and wikitech-l.

Done on wikitech-l, was on Tech News by Johan (the URL changing part at least)

Draft notes:

used some of these in the wikitech-l mail. thanks