Page MenuHomePhabricator

Rewrite CVNBot in another language
Open, Needs TriagePublic

Description

Imported from https://github.com/wikimedia/countervandalism-CVNBot/issues/73.

@Krinkle wrote on 22 June 2021:

Rewrites are often a terrible idea for large projects, but I think it might be called for here. Firstly, it's a fairly small and simple project. It's only a few hundred lines of code, and the minimal required complexity is fairly low. Basically all we do is:

  • Connect to event source (irc.wikimedia.org currently, perhaps worth moving to EventStreams as part of the rewrite, or shorlty thereafter. This is something I would know how to do in Python, but not in C#).
  • Connect to the main server (Libera) and channels (feed channel + control channel).
  • For each incoming message:
    • Apply a few simple boolean filters to the meta data.
    • Run 1 database query to determine whether the page title, edit summary, or username match a watch list.
    • If accepted, format a string, and send it to the feed channel.

Apart from that, we have a few basic control commands for restarting and adding/removing entries on the watchlists (documentation), which perform some additional maintenance tasks such as querying the MediaWiki API once for namespace prefixes, a list of known bots and admins, and some interface messages for helping to determine whether something is an "automatic edit summary" with special meanings (blanked page, replaced contents, parse parameters for log events such as blocked/move/protections, etc.). The latter would not be needed anymore if we use EventStreams.

The database is currently Sqlite, and migrating that to support a shared MySQL database (T327128) has long been blocked on familiarity with C# libaries and confidence in adding checking in additional DLL dependencies.

We currently have significant problems with the bot simply staying online, such as:

  • T327135: the source connection to irc.wikimedia.org frequently ends up lost in mysterious ways, despite auto-reconnect and auto-rejoin being enabled in the IRC library that we use.
  • T327134: the destination connection to Libera Chat often gets in a confused state after netsplits where it is not authenticated with NickServ, and it doesn't recover from this on its own, requiring a restart.

I'm hoping that the Python libraries for IRC are more mature and have this part just solved without requiring any attention. The cvn-clerkbot by comparison, which uses python-twisted, does not appear to have suffered from any connection problems. Although having said that, it doesn't send many messages, and we dont pay close attention to it, so this is something we'll have to see.

More broadly, me personally, I will feel much more motivated to fix bugs and make improvements in a language I actually understand and have good resources (and people) to lean on to help me with anything I don't know. I have absolutely no desire to learn more than the most basic of C# as I simply have no other outlet for applying that knowledge within my current job and the range of other open-source projects I maintain or contribute to. I also hope that by using Python, we'll have more people in our community able to contribute.

Event Timeline

Krinkle added a subscriber: Legoktm.
@Krinkle wrote on 10 July 2021:
# Monday, July 5th, 2021

06:06 ⇐ •cvn-clerkbot quit (~cvn-clerk@cvn/bot/cvn-clerkbot) *.net *.split
06:06 → cvn-clerkbot joined (~cvn-clerk@185.15.56.20)

06:11 cvn-clerkbot → Guest7260

19:35 ⇐ Guest7260 quit (~cvn-clerk@185.15.56.20) *.net *.split
19:39 → Guest7260 joined (~cvn-clerk@185.15.56.20)

# Saturday, July 10th, 2021

16:08 Krinkle: !quit
16:08 ⇐ Guest7260 quit (~cvn-clerk@185.15.56.20) Quit: Ordered by Krinkle
16:10 → cvn-clerkbot joined (~cvn-clerk@cvn/bot/cvn-clerkbot)

cvn-clerkbot, which uses python-twisted for IRC, lost its nick name again and did not self-correct in any way. Alternatives to consider:

Things to consider:

  • Message buffering to avoid flood kick.
  • Message splitting against maxlength.
  • Connect and authenticate with NickServ, then join channels.
  • Automatic re-authenticate and nick regaining/ghosting as-needed to deal with net splits, plus re-joining of channels to deal with restricted channels that can only be joined when authenticated.

See also EventSource as used by Pywikibot, which has an example of good error handling as part of the loop.

Ref https://github.com/wikimedia/pywikibot/blob/2b8402a66e28ae4be30f74deb9e4e72ac529ef69/pywikibot/comms/eventstreams.py#L288-L303

@Legoktm wrote on 11 July 2021:

Message buffering to avoid flood kick.

irc3 doesn't have this, we implemented it manually in wikibugs. I briefly searched the limnoria docs and didn't see anything obvious either (their flood stuff is about users flooding with !commands).

I do wonder if this is something that can be handled on the network side, getting some sort of higher flood limit or exemption.

Message splitting against maxlength.

Isn't this a think that should be done by the client, so that colors and whatnot are properly split or truncated? wikibugs has manual truncation logic that selects which projects should be listed when announcing a task, cutting off less important ones.

Connect and authenticate with NickServ, then join channels.

All 3 libraries support SASL, so this shouldn't ever be an issue.

Automatic re-authenticate and nick regaining/ghosting as-needed to deal with net splits, plus re-joining of channels to deal with restricted channels that can only be joined when authenticated.

We never did this for wikibugs, ib3 has mixins for this, appears limnoria does too. SASL should ensure that you're always authenticated when trying to join channels.