Imported from https://github.com/wikimedia/countervandalism-CVNBot/issues/73.
@Krinkle wrote on 22 June 2021:Rewrites are often a terrible idea for large projects, but I think it might be called for here. Firstly, it's a fairly small and simple project. It's only a few hundred lines of code, and the minimal required complexity is fairly low. Basically all we do is:
- Connect to event source (irc.wikimedia.org currently, perhaps worth moving to EventStreams as part of the rewrite, or shorlty thereafter. This is something I would know how to do in Python, but not in C#).
- Connect to the main server (Libera) and channels (feed channel + control channel).
- For each incoming message:
- Apply a few simple boolean filters to the meta data.
- Run 1 database query to determine whether the page title, edit summary, or username match a watch list.
- If accepted, format a string, and send it to the feed channel.
Apart from that, we have a few basic control commands for restarting and adding/removing entries on the watchlists (documentation), which perform some additional maintenance tasks such as querying the MediaWiki API once for namespace prefixes, a list of known bots and admins, and some interface messages for helping to determine whether something is an "automatic edit summary" with special meanings (blanked page, replaced contents, parse parameters for log events such as blocked/move/protections, etc.). The latter would not be needed anymore if we use EventStreams.
The database is currently Sqlite, and migrating that to support a shared MySQL database (T327128) has long been blocked on familiarity with C# libaries and confidence in adding checking in additional DLL dependencies.
We currently have significant problems with the bot simply staying online, such as:
- T327135: the source connection to irc.wikimedia.org frequently ends up lost in mysterious ways, despite auto-reconnect and auto-rejoin being enabled in the IRC library that we use.
- T327134: the destination connection to Libera Chat often gets in a confused state after netsplits where it is not authenticated with NickServ, and it doesn't recover from this on its own, requiring a restart.
I'm hoping that the Python libraries for IRC are more mature and have this part just solved without requiring any attention. The cvn-clerkbot by comparison, which uses python-twisted, does not appear to have suffered from any connection problems. Although having said that, it doesn't send many messages, and we dont pay close attention to it, so this is something we'll have to see.
More broadly, me personally, I will feel much more motivated to fix bugs and make improvements in a language I actually understand and have good resources (and people) to lean on to help me with anything I don't know. I have absolutely no desire to learn more than the most basic of C# as I simply have no other outlet for applying that knowledge within my current job and the range of other open-source projects I maintain or contribute to. I also hope that by using Python, we'll have more people in our community able to contribute.