Project title: Rewriting PendingChangesBot from PHP to Python
Description of project: // Wikimedia Finland rewriting the PendingChangesBot (ie. SeulojaBot which is automatically reviewing edits in Finnish Wikipedia, and the target is that we could deprecate the old PHP version in 2026.
Background
Some Wikipedias (dewiki, plwiki, huwiki, fiwiki, ruwiki; see the full list) use an extension named mw:FlaggedRevisions for tracking changes to articles. There are two different modes. In the first mode, edits need to be approved before they are shown by default to unregistered users. In the second mode, edits are directly visible to all users, and FlaggedRevs is used for approving changes. In most configurations, regular users are approved automatically, while edits from unregistered and new users are reviewed via FlaggedRevs.
However, Flaggedrews tends to generate a huge backlog, which is handled in Finnish Wikipedia by SeulojaBot—originally developed as a proof-of-concept at a hackathon in 2016 using PHP. The world has moved forward, and notably there are now LLMs that can be used for analyzing edits. Therefore, it is time to rewrite it using Python, with a proper end-user web interface and support for multiple different Wikipedias.
Projects slack channel
Connection info is in outreachy's project description. If you have problems in joining to slack channel and you don't get answer via email then please comment here or in Wikimedias Zullip. (ie. in some cases Gmail delivers emails with delay and/or sets emails to spam folder so this is secondary channel)
Expected outcomes: In first quarter of 2026 we can deprecate old PHP bot
Required skills and/or preferred skills: Python, Django, Pywikibot, mediawiki API and open source LLM model knowledge is nice to have //
Mentor(s):
- @Zache (Bots original developer in 2016 and maintainer. Mentoring Cat-a-lot Outreachy in the round 30)
- @adiba_anjum (Wikimedia Outreachy intern in round 30, knowledge with LLM:s and mediawiki API)
- @Ademola (Wikimedia Outreachy in round 30, knowledge with python)
- @Ipr1 (finnish wikipedist and Wikimedia Finlands fiwiki developer/helpdesk person)
Selected intern:
Size of project:350
Add a rating of difficulty for the project - easy, medium, or hard. medium:
Homepage
Contribution documentation
Microtasks:
Filter rule tasks (easy):
Task is checked and gray if someone is working on. Ticket is overstriked if it is marked as done
Approve article edits if :
- ... edit was made by autopatrolled or autoreviewed user
- ... edit was made by bot
- ... edit was made by former bot T406445
- ... edit was made by global bot or former global bot T406443
- ... edit was revert or reverted and newer version is identical to already reviewed version T406450
- ... edit was patrolled
- ... edit was done to whitelisted articles
- ... edit did not make substantial changes ( old test was if the all of the words changed were already used in the article )
- ... edit changes only references (github issue only)
Do not approve edit automatically if :
- ... edit was done to article which is in ''important'' (ie. Featured articles, Good articles ... )
- ... edit was done by editor who was blocked after the edit ( T406329 )
- ... edit changed the existing article to redirect ( T406336 )
- ... edit removed all categories from existing article (T406438)
- ... in html rendered version there is text with CSS class=error and in old version did not have it. (ie broken wikicode test) (T406440)
- ... if the edit was previously approved, but approval was manually removed (T406442)
- ... if edit added words to to article which havent never used in this language version before. (ie. likely typos detection)
- ... edit has high revertrisk score from Multilingual_revert_risk model T406446
- ... edit has high revertrisk score from language-agnostic revertrisk model (Github issue only)
- ...edit adds link to new domain (Github issue only)
- .. edit adds ISBN identifiers which checksum fails (Github issue only)
Missing rules (investigation tasks: never implemented before in any form, suitable for multiple persons at same time)
- ... no content from the edit is in the latest version of the article (T406813)
- Investigate automated detection methods for LLM-generated and machine-translated content (T406818)
UI tasks
- Polish the user interfece (current UI is ugly, fix the table layouts etc
- Add Wikipedias which are using FlaggedRevs to selection UI T406295
- Add FlaggedRevs info links to UI T406294
- Display diffs for individual revision (Github issue only)
- Add Search Box Filter to PendingChangesList (Github issue only)
- bugfix: Refresh button provides no visible feedback when data is loaded (Github issue only)
Other tasks
- Create unique word list from Finnish Wikipedia dump T406447
- Add Test Data Mode with Revision ID List T406665
- Suppress confusing error message in unit tests (Github issue only)
More complex
Complex tasks
PendingChangesBot testing requests