Page MenuHomePhabricator

Outreachy 31: Rewriting PendingChangesBot from PHP to Python
Open, MediumPublic

Description

Project title: Rewriting PendingChangesBot from PHP to Python
Description of project: // Wikimedia Finland rewriting the PendingChangesBot (ie. SeulojaBot which is automatically reviewing edits in Finnish Wikipedia, and the target is that we could deprecate the old PHP version in 2026.

Background

Some Wikipedias (dewiki, plwiki, huwiki, fiwiki, ruwiki; see the full list) use an extension named mw:FlaggedRevisions for tracking changes to articles. There are two different modes. In the first mode, edits need to be approved before they are shown by default to unregistered users. In the second mode, edits are directly visible to all users, and FlaggedRevs is used for approving changes. In most configurations, regular users are approved automatically, while edits from unregistered and new users are reviewed via FlaggedRevs.

However, Flaggedrews tends to generate a huge backlog, which is handled in Finnish Wikipedia by SeulojaBot—originally developed as a proof-of-concept at a hackathon in 2016 using PHP. The world has moved forward, and notably there are now LLMs that can be used for analyzing edits. Therefore, it is time to rewrite it using Python, with a proper end-user web interface and support for multiple different Wikipedias.

Projects slack channel
Connection info is in outreachy's project description. If you have problems in joining to slack channel and you don't get answer via email then please comment here or in Wikimedias Zullip. (ie. in some cases Gmail delivers emails with delay and/or sets emails to spam folder so this is secondary channel)


Expected outcomes:
In first quarter of 2026 we can deprecate old PHP bot
Required skills and/or preferred skills:
Python, Django, Pywikibot, mediawiki API and open source LLM model knowledge is nice to have //
Mentor(s):

  • @Zache (Bots original developer in 2016 and maintainer. Mentoring Cat-a-lot Outreachy in the round 30)
  • @adiba_anjum (Wikimedia Outreachy intern in round 30, knowledge with LLM:s and mediawiki API)
  • @Ademola (Wikimedia Outreachy in round 30, knowledge with python)
  • @Ipr1 (finnish wikipedist and Wikimedia Finlands fiwiki developer/helpdesk person)

Selected intern:

Size of project:350
Add a rating of difficulty for the project - easy, medium, or hard. medium:

Homepage

Contribution documentation

Microtasks:

Filter rule tasks (easy):
Task is checked and gray if someone is working on. Ticket is overstriked if it is marked as done

Approve article edits if :

  • ... edit was made by autopatrolled or autoreviewed user
  • ... edit was made by bot
  • ... edit was made by former bot T406445
  • ... edit was made by global bot or former global bot T406443
  • ... edit was revert or reverted and newer version is identical to already reviewed version T406450
  • ... edit was patrolled
  • ... edit was done to whitelisted articles
  • ... edit did not make substantial changes ( old test was if the all of the words changed were already used in the article )
  • ... edit changes only references (github issue only)

Do not approve edit automatically if :

  • ... edit was done to article which is in ''important'' (ie. Featured articles, Good articles ... )
  • ... edit was done by editor who was blocked after the edit ( T406329 )
  • ... edit changed the existing article to redirect ( T406336 )
  • ... edit removed all categories from existing article (T406438)
  • ... in html rendered version there is text with CSS class=error and in old version did not have it. (ie broken wikicode test) (T406440)
  • ... if the edit was previously approved, but approval was manually removed (T406442)
  • ... if edit added words to to article which havent never used in this language version before. (ie. likely typos detection)
  • ... edit has high revertrisk score from Multilingual_revert_risk model T406446
  • ... edit has high revertrisk score from language-agnostic revertrisk model (Github issue only)
  • ...edit adds link to new domain (Github issue only)
  • .. edit adds ISBN identifiers which checksum fails (Github issue only)

Missing rules (investigation tasks: never implemented before in any form, suitable for multiple persons at same time)

  • ... no content from the edit is in the latest version of the article (T406813)
  • Investigate automated detection methods for LLM-generated and machine-translated content (T406818)

UI tasks

Other tasks

More complex

Complex tasks

PendingChangesBot testing requests

IMPORTANT: GSoC / Outreachy candidates are required to complete micro-tasks during the application period to prove their ability to work on a three month long project.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Zache renamed this task from Rewriting PendingChangesBot from PHP to Python to Outreachy 31: Rewriting PendingChangesBot from PHP to Python.Sep 26 2025, 11:10 AM
Zache updated the task description. (Show Details)
Zache added a subscriber: adiba_anjum.
Zache moved this task from Backlog to Outreachy 31 on the PendingChangesBot board.
Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)
Zache added a subscriber: Ipr1.
Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)

To the mentors: Has anyone started contributing to this project?

current stats is that 15 people have been joined to slack channel, 7 unique persons have been claimed task (some have claimed multiple) and 4 pull requests.

current stats is that 15 people have been joined to slack channel, 7 unique persons have been claimed task (some have claimed multiple) and 4 pull requests.

@Zache that's great to hear,..I'v seen in the past though, claiming tasks has made participants shy to contribute to projects. You can please let the contributors know that anyone is allowed to contribute to any task whether it's claimed or not..

We just added a rule that a contributor can pick a new task after submitting a pull request, and if a user wants to switch to another task, they should just unassign the task and say in the GitHub issue that the task is free to work on.

Probably the biggest thing is that I should create some larger tickets for more experienced participants, as now people are starting to understand the codebase and the idea of the tool.

Zache updated the task description. (Show Details)

We just added a rule that a contributor can pick a new task after submitting a pull request, and if a user wants to switch to another task, they should just unassign the task and say in the GitHub issue that the task is free to work on.

Probably the biggest thing is that I should create some larger tickets for more experienced participants, as now people are starting to understand the codebase and the idea of the tool.

That's okay. Let the newer tasks that you'll create also be open to whoever who'd want to work on them..

Hello Every one My name is Agaba Derrick I'm new, I request to be added to the slack channel I kindly sent an email and I haven't been added

Hello Every one My name is Agaba Derrick I'm new, I request to be added to the slack channel I kindly sent an email and I haven't been added

Thank you for noting! Confirmation, did you get now the invitation email from slack? (if not could you sent new email to info@wikimedia.fi so i will get the email)

Hello Every one My name is Agaba Derrick I'm new, I request to be added to the slack channel I kindly sent an email and I haven't been added

Thank you for noting! Confirmation, did you get now the invitation email from slack? (if not could you sent new email to info@wikimedia.fi so i will get the email)

thank you Zache I just sent a new email

Thanks, now it got through and invite was successful :)

@Zache

Thanks, now it got through and invite was successful :)

I sent an email requesting the slack invite link also but I haven't gotten any reply yet.

Hi Every one!
My name is Olamiposi and I plan to contribute to this project for Outreachy. I sent a mail info@wikimedia.fi (I saw it on the thread) to be added to the slack channel.

Hello @Zache
I still haven't gotten an update concerning the slack invitation. Please kindly assist me at your convenience, thank you :)

@System625 : based on the username and name in Phabricator still not through in gmail. Based on last Outreachy round there is delay on when it delivers the messages for some reason what only google know.

Is it same email than in github commits ( ie. the on which starts with th and ends with 624 ) ? If so then i will send the invite to that?

@Zache That is my email but it is not the one I used to sign up for the wikimedia accounts. This is it: tunde-ajayi.olaoluwasijibomi@lmu.edu.ng
Apologies for the confusion

@Zache I've accepted the invitation, thank you!

Hi @Zache this is David Dhieu I'm contributing to PendingChangesBot, I request to be added to the slack channel I kindly sent an email and I haven't been added

here is my email dhieumajok211@gmail.com

Hello everyone I need help to join the slack channel.
I did send an email to join however haven't got any feedback.

Thanks

Thanks for notifying, found your email and sent invitation

Thanks for notifying, found your email and sent invitation

I haven't gotten my invitation yet. This is ny email: ezrayendau2000@gmail.com

LGoto triaged this task as Medium priority.Nov 5 2025, 9:30 PM

Week 1: December 8 2025 - December 12 2025

Overview of Tasks Completed:

  • Completed Django OAuth application setup on Toolforge with working authentication
  • Replaced Pywikibot with mwclient to resolve multi-user caching issues
  • Migrated database backend from SQLite to MariaDB for production deployment
  • Integrated English Wikipedia replica database (enwiki_p) access using Django ORM
  • Created Django models for Wikipedia database tables (WikiPage, WikiRevision, WikiActor)
  • Implemented article search feature with namespace filtering, redirect exclusion, and result limits
  • Configured secure credential storage using Toolforge environment variables
  • Successfully deployed application at https://harshita-wiki-test.toolforge.org/

Challenges Faced:

  • OAuth callback URL misconfiguration - initially pointed to production instead of beta system
  • Wiki replica database selection - beta Commons replica doesn't exist, had to use production English Wikipedia replica instead
  • Username normalization issue - Django was removing spaces from MediaWiki usernames
  • Binary field searching - page_title field queries required proper database specification using .using('wiki_replica')

Learnings and Skills Gained:

  • OAuth 1.0a authentication flow and secure credential management
  • mwclient library for MediaWiki API interactions without state caching issues
  • Django multi-database configuration and read-only models with managed=False
  • MediaWiki database architecture including namespaces and the actor table pattern
  • Toolforge infrastructure including webservice management, environment variables, and log debugging
  • Wiki Replica database access patterns and naming conventions

Repository: https://github.com/miraclousharshita/django-wiki-oauth
Deployed Application: https://harshita-wiki-test.toolforge.org/

Week 1: December 8, 2025 - December 12, 2025

  1. Overview of Tasks I Completed:
  • Django OAuth Setup: Setup of Django application on Toolforge with Wikimedia OAuth 1.0a authentication
  • Pywikibot Integration: Implemented OAuth credential passing from Django to Pywikibot for wiki operations
  • Database Configuration: Migraed to MariaDB using read_default_file for secure credential management, configured private tool database.
  • Wiki Replica Integration: Created Django ORM models for Meta-Wiki replica database (metawiki_p) including Page, Revision, User, Actor,RecentChanges, and Logging tables

-Search Feature: Implemented wiki page search functionality querying the replica database with title filtering and metadata display

  1. Key Accomplishments:
  2. Deployment: Successfully deployed working application at https://django-oauth-erik.toolforge.org/
  3. Multi-User Issue Resolution: Discovered critical Pywikibot credential caching issue (credit to Zache) and migrated to mwclient for thread-safe OAuth operations in multi-user environments
  4. API Optimization: Implemented hybrid approach using public API for read operations and OAuth only for authenticated actions, improving performance and avoiding permission issues
  5. Complete Feature Set: Delivered fully functional app with OAuth login, user profile display with edit count/contributions, and wiki replica search
  1. Challenges Faced:
  • Pywikibot Multi-User Bug: Discovered Pywikibot caches OAuth credentials globally, causing authentication conflicts when multiple users log in simultaneously. Solution: Migrated to mwclient which creates isolated client instances per request, making it safe for concurrent users.
  • OAuth Permission Scope: Initial OAuth consumer registered with "User identity verification only" blocked API read access, causing readapidenied errors. Solution: Switched to using public MediaWiki API for reading public data (edit counts, contributions) and reserve OAuth only for operations requiring it.
  • mwclient API Path Configuration: Initial configuration used wrong path (/) instead of MediaWiki API path (/w/), causing 404 errors. Solution: Corrected path configuration in mwclient Site initialization.
  • Database Router Setup: Queries to wiki_replica models were hitting wrong database. Solution: Implemented custom database router to direct wiki_replica app queries to the replica database.
  1. Learnings and Skills Gained:
  • OAuth 1.0a Authentication: Understanding authentication flow, token management, and the difference between identity verification grants vs. API access grants
  • Concurrent Web Application Architecture: Learned why global state (like Pywikibot's credential caching) breaks in multi-user environments and importance of thread-safe libraries
  • mwclient Library: Thread-safe MediaWiki API client suitable for web applications with multiple concurrent users
  • Django Multi-Database Configuration: Setting up database routers, read-only models with managed=False, and connecting to external databases
  • MediaWiki Database Schema: Understanding namespaces, the actor table pattern, page/revision relationship, and wiki replica naming conventions(_p suffix for public replicas)
  • Toolforge Infrastructure: Webservice deployment, environment variables, become command for tool accounts, git-based deployment workflows, and debugging with logs
  • API Usage Optimization: Learned to use public API for publicly available data and OAuth only when necessary to minimize permission requirements
  1. Feedback and Support Needed:
  2. Code Review: Would appreciate feedback on the mwclient implementation and database router configuration to ensure best practices for PendingChangesBot
  3. Next Steps: Guidance on which features or improvements to prioritize for the PendingChangesBot project
  1. Goals for Next Week:
  • Begin working on PendingChangesBot tasks
  • Study FlaggedRevs extension and pending changes workflow
  • Explore WikiWho integration for content attribution tracking
  • Review existing PendingChangesBot codebase and architecture
  1. Additional Notes:

Week 2: December 15, 2025 - December 21, 2025

Overview of Tasks I Completed:

Option 1 - Django OAuth App Feature Expansion:
- REST API Implementation: Built complete REST API using Django REST Framework with endpoints for user profile, contributions, wiki search, and statistics
- Vue.js Frontend Integration: Implemented Vue.js 3 (CDN) with reactive components for dynamic user profile, real-time wiki search with debouncing, and interactive statistics dashboard
- Multi-Language Support: Configured Django i18n framework with English and Finnish translations, implemented language switcher in UI
- Testing & Quality Assurance: Set up pytest for unit testing, bandit for security scanning, safety for dependency vulnerability checks, and mypy for type checking

Option 2 - PendingChangesBot Toolforge Deployment:
- Database Configuration: Adapted settings.py for dual database support (SQLite for local development, MariaDB for Toolforge production using read_default_file)
- Environment-Based Configuration: Implemented TOOLFORGE_DEPLOYMENT flag, made DEBUG and ALLOWED_HOSTS configurable via environment variables
- Toolforge Setup: Configured tool account (tools.pendingchangesbot), set up directory structure (~/www/python/src), created venv in webservice shell
- Deployment: Successfully deployed application with all migrations run, statistics views functional
- Dependencies: Added PyMySQL and cryptography for MariaDB compatibility

Key Accomplishments:

Deployments:
- Option 1 Enhanced: https://django-oauth-erik.toolforge.org (now with API, Vue.js, multi-language, tests)
- Option 2 Deployed: https://pendingchangesbot.toolforge.org (statistics views ready)

Technical Achievements:
- Built production-ready REST API following Django REST Framework best practices
- Implemented client-side rendering with Vue.js to handle US-Europe latency (as per Zache's requirement)
- Created bilingual interface demonstrating i18n best practices
- Established comprehensive testing pipeline (unit tests, security, type checking)
- Successfully adapted existing Django project for Toolforge deployment
- Configured environment-based settings for seamless local/production workflows

Challenges Faced:

Vue.js Learning Curve: More familiar with React, had to learn Vue.js 3 Composition API, reactivity system, and template syntax. Solution: Studied Vue.js documentation, used CDN approach for simplicity, focused on core features (reactive data, computed properties, methods).

Toolforge Deployment Constraints:
- No pip/venv on bastion node, couldn't install packages directly. Solution: Used toolforge webservice shell to access compute node with pip access, created venv there.
- toolforge envvars create syntax different from expected (no --value flag). Solution: Read help docs, used space-separated format instead.
- uwsgi looking for app.py but we had Django WSGI. Solution: Created app.py wrapper importing from reviewer.wsgi.      

Database Configuration: Understanding when Django needs migrations for managed=False models in wiki_replica. Solution: Ran makemigrations to create migration files even though tables aren't managed by Django, satisfying Django's migration checker.

Multi-Language Setup: First time implementing Django i18n, understanding message files, compilemessages command. Solution: Used makemessages/compilemessages workflow, organized translations in locale/ directories per app.

Testing Framework Integration: Setting up pytest with Django, configuring coverage, understanding bandit and mypy output. Solution: Created pytest.ini, configured pytest-django, iteratively fixed type hints and security warnings.

Learnings and Skills Gained:

Django REST Framework: Building serializers, ViewSets, configuring URL routing, handling API authentication, JSON response formatting

Vue.js 3: Reactive data binding, component lifecycle, computed properties, methods, event handling, template syntax, CDN integration with Django templates

Django Internationalization (i18n): Using {% trans %} and {% blocktrans %} template tags, creating message files with makemessages, translating strings, language middleware configuration

Testing & Quality Tools:
- pytest: Django test configuration, fixtures, test database setup
- bandit: Security vulnerability scanning, understanding OWASP risks
- mypy: Type checking for Python, adding type hints, configuring for Django
- safety: Dependency vulnerability checking

Toolforge Advanced Deployment:
- toolforge envvars: Environment variable management for services
- toolforge webservice shell: Accessing compute nodes for package installation
- uwsgi configuration: WSGI app setup, virtualenv detection, worker processes
- Toolforge job system: Running one-off commands like migrations

Environment-Based Configuration: Using os.getenv() for flexible settings, differentiating local vs production, secure credential management with read_default_file

Feedback and Support Needed:

Code Review: Would appreciate feedback on:
- REST API design and endpoint structure
- Vue.js component organization and reactivity patterns
- Multi-language implementation approach
- PendingChangesBot deployment configuration

Next Steps:
- Should I add more languages to the boilerplate app?
- Any additional features needed for PendingChangesBot statistics?
- Guidance on next PendingChangesBot tasks (review workflow implementation?)

Goals for Next Week:

- Explore PendingChangesBot statistics data loading from MediaWiki FlaggedRevs
- Study auto-review system and quality checks architecture
- Investigate integration with ORES/LiftWing for edit quality predictions
- Review Vue.js frontend for pending changes review interface
- Understand WikiConfiguration model and per-wiki settings

Additional Notes:

Repositories:
- Option 1: https://github.com/xenacode-art/django-oauth-wikimedia-task

Live Applications:
- Django OAuth Boilerplate: https://django-oauth-erik.toolforge.org
- PendingChangesBot: https://pendingchangesbot.toolforge.org

Key Files Modified (Option 2):
- app/reviewer/settings.py - Database configuration with TOOLFORGE_DEPLOYMENT flag
- requirements.txt - Added PyMySQL and cryptography
- .gitignore - Added staticfiles/ directory

Week 2 : December 15, 2025 - December 19 2025


1. Overview of Tasks Completed

Task 1: Built REST API

  • Created 3 API endpoints: user info, wiki stats, and article search
  • Used Django REST Framework with proper authentication

Task 2: Added Vue.js Frontend

  • Built a Vue.js app that calls the APIs
  • Uses AJAX to load data, which helps reduce lag for users in Europe
  • Added loading indicators and error messages

Task 3: Multi-language Support

  • Added English and Finnish language options
  • Created a language switcher dropdown on the homepage
  • Translated all menu items and labels

Task 4: Testing and Code Quality

  • Wrote unit tests for all views, APIs, and models
  • Set up automatic code checking (linting, formatting, type checking)
  • Added pre-commit hooks to check code before each commit

2. Key Accomplishments

  • Turned the basic OAuth app into a complete boilerplate that others can reuse
  • Fixed the latency issue by using client-side rendering instead of server-side
  • Created a full testing setup with good code coverage

3. Challenges Faced

Challenge 1: CORS Configuration

  • Problem: API calls failed due to cross-origin restrictions
  • Solution: Set up different CORS rules for local testing and production

Challenge 2: Testing Without Wiki Database

  • Problem: Tests failed when wiki_replica database wasn't available locally
  • Solution: Made the code work gracefully without the database in test mode

4. Learnings and Skills Gained

  • How to build REST APIs with Django REST Framework
  • Vue.js 3 for building interactive frontends
  • Django internationalization for multi-language apps
  • Writing good unit tests with pytest
  • Using modern Python tools (Ruff, mypy) for code quality

5. Additional Notes

Week 3: December 22 - 26 , 2025

Overview of Tasks I  Completed

Task 1: Debugged PendingChangesBot Deployment
- Identified Vue.js templates displaying as raw {{xyz}} instead of rendering
- Diagnosed static files (CSS/JS) not loading on Toolforge
- Fixed 500 Internal Server Error on production deployment

Task 2: Fixed Static Files Serving
- Added WhiteNoise middleware for efficient static file serving in production
- Configured Content Security Policy to allow Vue.js and Chart.js CDNs
- Ran collectstatic to gather all CSS/JS files into proper directory

Task 3: Fixed Database Configuration
- Created new MariaDB database: s57224__pendingchangesbot
- Fixed database credential mismatch (user s57224 vs database s57230)
- Created proper ~/toolsdb.my.cnf credentials file
- Updated Django settings to use correct database and credentials

Task 4: Fixed Application Structure
- Resolved Python import path issues after fresh git clone
- Created proper app.py wrapper file for uWSGI
- Fixed module resolution for new repository structure

2. Key Accomplishments

- Successfully deployed PendingChangesBot to Toolforge at https://pendingchangesbot.toolforge.org
- All migrations run successfully - 38 migrations applied to MariaDB database
- Vue.js rendering properly - No more raw template syntax showing
- Full styling working - CSS loaded, buttons interactive, dropdowns functional
- Statistics views operational - Ready for users to view FlaggedRevs data

3. Challenges Faced

Challenge 1: Static Files Not Loading
- Problem: CSS and JS files returned 404 errors, MIME type errors showed text/html instead of proper types
- Root cause: Django's default static file serving doesn't work well on Toolforge
- Solution: Added WhiteNoise middleware and ran collectstatic command

Challenge 2: Content Security Policy Blocking CDNs
- Problem: Browser blocked Vue.js CDN (unpkg.com) and Chart.js CDN due to strict CSP
- Solution: Installed django-csp and configured CSP headers to allow necessary CDNs while maintaining security

Challenge 3: Database Credential Mismatch
- Problem: User s57224 trying to access database s57230__pendingchangesbot - access denied
- Root cause: Environment variables from different tool accounts mixed up
- Solution: Created new database for correct user, updated credentials file path from replica.my.cnf to toolsdb.my.cnf 

Challenge 4: Python Import Path Issues
- Problem: ModuleNotFoundError: No module named 'reviewer' after cloning fresh repo
- Root cause: New repo structure has app/ directory, uWSGI couldn't find modules
- Solution: Created app.py that adds app/ to Python path before importing WSGI application

Challenge 5: Terminal Heredoc Issues
- Problem: Bash heredoc (cat << EOF) adding unexpected indentation to files
- Root cause: Terminal pasting behavior adding leading spaces
- Solution: Used printf command instead of heredoc for reliable file creation

4. Learnings and Skills Gained

WhiteNoise for Production Static Files
- How WhiteNoise serves static files efficiently without needing separate web server
- CompressedManifestStaticFilesStorage for optimized file serving
- Difference between development and production static file handling

django-csp for Content Security Policy
- Configuring CSP headers to allow specific CDN domains
- Understanding CSP_SCRIPT_SRC, CSP_STYLE_SRC, CSP_CONNECT_SRC directives
- Balancing security with functionality (allowing unsafe-eval for Vue.js)

Toolforge Database Management
- Using toolforge envvars to manage environment variables
- Creating and accessing tool databases on tools.db.svc.wikimedia.cloud
- Understanding Toolforge credential files (toolsdb.my.cnf, replica.my.cnf)
- Difference between wiki replica databases and tool-specific databases

Django Migrations in Production
- Running migrations in toolforge webservice shell environment
- Understanding when migrations are needed even for managed=False models
- Debugging OperationalError database connection issues

Production Debugging Techniques
- Reading uWSGI logs (~/uwsgi.log) to diagnose startup errors
- Using browser DevTools Console and Network tabs for frontend debugging
- Identifying MIME type errors vs 404 vs CSP violations
- Debugging Python import path issues in production environments

Git Repository Management
- Cloning fresh repo vs copying files for deployment
- Managing git branches on Toolforge
- Understanding when backup directories are needed

5. Feedback and Support Needed

Code Review
- Would appreciate feedback on the WhiteNoise and CSP configuration
- Is CompressedManifestStaticFilesStorage the right choice for Toolforge?
- Database naming convention - should we use s57224__pendingchangesbot or request access to s57230__pendingchangesbot? 

Next Steps
- Should we populate the database with initial wiki configuration data?
- Do we need to set up automated data loading from wiki replicas?
- Guidance on enabling the auto-review functionality?

6. Additional Notes

Live Application
- PendingChangesBot: https://pendingchangesbot.toolforge.org
- Statistics views functional and ready for use

Repository
- GitHub: https://github.com/xenacode-art/PendingChangesBot-ng
- Branch: feature/mypy-complete-fix
- Latest commit: Fix static files serving and CSP for Toolforge deployment

Key Files Modified
- requirements.txt - Added whitenoise>=6.6.0, django-csp>=3.8
- app/reviewer/settings.py - Added WhiteNoise middleware, CSP configuration, updated database credentials path
- app.py - Created wrapper file with proper Python path setup
- ~/toolsdb.my.cnf - Created database credentials file

Database
- Name: s57224__pendingchangesbot
- Host: tools.db.svc.wikimedia.cloud
- All 38 migrations applied successfully

Week 3: December 22, 2025 - December 26, 2025

1. Overview of Tasks Completed

Task 1: Bot Deployment Review

  • Reviewed pending bot deployment awaiting fixes
  • Identified issues blocking the deployment
  • Documented required changes for resolution

Task 2: Toolforge Logging System

  • Explored how logs are stored and managed on Toolforge
  • Learned log access methods and best practices
  • Reviewed existing bot logs to understand error patterns

Task 3: Gerrit and Phabricator Workflow

  • Studied the code review process on Gerrit
  • Explored Phabricator for task tracking and project management
  • Understood the contribution workflow for Wikimedia projects

2. Key Accomplishments

  • Gained hands-on understanding of Toolforge infrastructure
  • Familiarized with Wikimedia's development and review processes
  • Identified next steps for bot deployment fixes

3. Challenges Faced

Challenge 1: Understanding Gerrit/Phabricator

  • Problem: Initial learning curve with Wikimedia's development tools
  • Solution: Reviewed documentation and existing contributions to understand workflows

4. Learnings and Skills Gained

  • Toolforge logging architecture and debugging techniques
  • Gerrit code review system and submission process
  • Phabricator task management workflow
  • Wikimedia contribution guidelines and best practices

5. Additional Notes

  • Week had slower progress due to holiday season
  • Focus was on understanding infrastructure and processes rather than feature development
  • Ready to resume full development work in the coming week

Week 4: December 29, 2025 - January 2, 2026

Overview of Tasks I Completed

 Direct SQL Access Implementation
- Implemented direct SQL connection to wiki replica databases following mentor's (Zache) recommendation
- Created connection management system that opens connections when needed and closes immediately after use
- Replaced Pywikibot SupersetQuery approach with direct PyMySQL connections

Wiki Replica Connection Module
- Created wiki_replica_connection.py with WikiReplicaConnection class
- Implemented context manager pattern for automatic connection cleanup
- Fixed hostname construction to properly resolve wiki databases (e.g., fiwiki.analytics.db.svc.wikimedia.cloud)

Direct SQL Statistics Service
- Created direct_sql_services.py with DirectSQLStatisticsClient
- Implemented methods: fetch_flaggedrevs_statistics(), fetch_review_activity(), and fetch_review_statistics_from_logging()
- Built complex SQL queries for aggregating statistics data with configurable resolution (daily/monthly/yearly)

Management Command for Data Loading
- Created load_flaggedrevs_statistics_direct_sql.py Django management command
- Implemented incremental update feature (auto-continues from last month)
- Added options for full refresh, wiki-specific loading, and date range filtering
- Successfully loaded 169 FlaggedRevs statistics and 181 review activity records for fiwiki

Task 5: Deployment and Testing
- Fixed database configuration (corrected database name from s57230 to s57224)
- Deployed to Toolforge: https://pendingchangesbot.toolforge.org
- Verified API endpoints returning correct data
- Confirmed UI charts displaying all 8 statistical visualizations correctly

2. Key Accomplishments

- Successfully implemented production-ready direct SQL access to Wikimedia wiki replicas
- Achieved efficient connection management to avoid exhausting connection pools
- Deployed working solution with 350 total statistics records loaded
- All API endpoints and UI visualizations functioning correctly
- Code merged and pushed to GitHub repository

3. Challenges Faced

Challenge 1: DNS Resolution Error
- Problem: Initial connection attempts failed with "Name or service not known" error for fi.analytics.db.svc.wikimedia.cloud
- Solution: Fixed hostname construction to include wiki family (e.g., fiwiki instead of fi) by reading from Wiki model's family field

Challenge 2: Database Name Mismatch
- Problem: Access denied error - user s57224 trying to access s57230 database
- Solution: Updated settings.py to use correct database name (s57224__pendingchangesbot) and environment variables

Challenge 3: Data Visibility
- Problem: Data was loaded into SQLite locally but webservice used MariaDB
- Solution: Re-ran load command on Toolforge after fixing database configuration to populate production MariaDB

4. Learnings and Skills Gained

- Wikimedia Cloud Services infrastructure and wiki replica database architecture
- DNS-based database routing for hundreds of wiki databases
- PyMySQL connection management with context managers
- Complex SQL query construction for time-series aggregation
- Django management command development with argument parsing
- Production deployment and debugging on Toolforge
- Database migration and data loading in production environment

5. Technical Details

Files Created/Modified:
- app/review_statistics/wiki_replica_connection.py (new)
- app/review_statistics/direct_sql_services.py (new)
- app/review_statistics/management/commands/load_flaggedrevs_statistics_direct_sql.py (new)
- app/reviewer/settings.py (modified - database name fix)

Command for Loading Data:
TOOLFORGE_DEPLOYMENT=true python manage.py load_flaggedrevs_statistics_direct_sql --wiki fi

Results:
- API Endpoint 1: /api/flaggedrevs-statistics/?wiki=fi - 169 records
- API Endpoint 2: /api/flaggedrevs-activity/?wiki=fi - 181 records
- All 8 chart visualizations rendering successfully

6. Additional Notes

- Implementation follows best practices recommended by mentor Zache for connection pooling
- Solution is scalable for multiple wiki databases without exhausting connections
- Ready for loading statistics data for additional wikis beyond fiwiki
- Repository: https://github.com/xenacode-art/PendingChangesBot-ng (branch: feature/mypy-complete-fix)

Week 4: December 29, 2025 – January 2, 2026

1. Overview of Tasks Completed

Task 1: Migration from Pywikibot to Direct SQL
  • Reviewed existing implementation using Pywikibot SupersetQuery.
  • Studied the limitations of SupersetQuery, especially connection pool exhaustion on Toolforge.
  • Implemented direct SQL access to wiki replica databases as recommended.
  • Created a reusable connection layer using context managers to ensure connections open only when needed and close immediately after use.
Task 2: Direct SQL Statistics Implementation
  • Implemented services to fetch:
    • Monthly FlaggedRevs statistics
    • Reviewer activity data
    • Individual review records from logging tables
  • Added Django management commands to support incremental data loading.
  • Verified correct database and hostname construction for wiki replicas.
Task 3: Toolforge Deployment & Debugging
  • Deployed changes on Toolforge and ran database migrations.
  • Debugged SQL connection issues during deployment.
  • Identified DNS resolution issues while connecting to wiki replicas.
  • Fixed hostname construction to follow {wiki_code}{family}.analytics.db.svc.wikimedia.cloud.

2. Key Accomplishments

  • Successfully replaced Pywikibot-based queries with direct SQL access.
  • Fixed Toolforge SQL connection issues caused by incorrect hostname resolution.
  • Loaded FlaggedRevs statistics and review activity data successfully.
  • Verified that all charts and statistics pages are working correctly.
  • Ensured statistics refresh functionality works using direct SQL.

3. Challenges Faced

Challenge 1: SQL Connection Failures on Toolforge
  • Problem: Unable to fetch data from wiki replica databases due to DNS resolution errors.
  • Solution: Discovered that the hostname must be constructed as {wiki_code}{family} (e.g., fiwiki instead of fi) and updated both hostname and database name logic accordingly.

4. Learnings and Skills Gained

  • Direct interaction with Wikimedia wiki replica databases.
  • Toolforge-specific SQL access patterns and DNS conventions.
  • Importance of connection lifecycle management using context managers.
  • Better understanding of Wikimedia infrastructure and deployment workflows.

5. Additional Notes

  • Progress was slightly slower due to the holiday period.
  • Focus this week was on infrastructure stability, deployment, and debugging rather than adding new features.
  • With deployment issues resolved, the project is now well-positioned for faster development in the coming weeks.
This comment was removed by Xinacod.

Week 5: January 5 – January 10, 2026

Overview of Tasks I Completed

Part A: PendingChangesBot-ng (Continuation)

Task 1: Comprehensive Test Suite Development

- Wrote 37 unit tests covering all direct SQL functionality
- Created test files:
  - test_wiki_replica_connection.py (14 tests)
  - test_direct_sql_services.py (12 tests)
  - test_management_commands.py (11 tests)
- All tests passing successfully with proper mocking and fixtures
- Tests cover connection management, query execution, error handling, and incremental loading

Task 2: Multi-Wiki Support Implementation

- Created add_wiki management command to simplify adding new wikis
- Tested statistics loading across multiple Wikipedia editions:
  - ✅ Finnish Wikipedia (fi): 169 FlaggedRevs statistics
  - ✅ German Wikipedia (de): 172 FlaggedRevs statistics, 193 review activities
  - ❌ Swedish Wikipedia (sv): No FlaggedRevs extension
  - ❌ English Wikipedia (en): No FlaggedRevs extension
- Documented which wikis support FlaggedRevs vs which don't
- Successfully verified multi-wiki architecture works on Toolforge

Task 3: Documentation & Knowledge Transfer

- Created comprehensive DIRECT_SQL_STATISTICS.md documentation:
  - Architecture overview with component descriptions
  - Usage guide with command examples
  - SQL query explanations
  - Troubleshooting guide
  - Migration guide from Pywikibot
- Updated README.md with statistics section
- Documented supported wikis and common error scenarios
- Added examples showing working vs failing wiki codes

Task 4: CI/CD Pipeline Fixes

- Resolved all GitHub Actions CI failures:
  - Fixed Ruff linting errors (21 → 0 errors)
  - Removed unused imports
  - Fixed line length violations in docstrings
  - Disabled false-positive SQL injection warnings (S608)
  - Auto-fixed import sorting issues
- All CI checks now passing (Tests, Linting, Security Scans)

Task 5: Pull Request Submission

- Created PR #163: "Implement direct SQL access for statistics"
- Includes 14 commits with 1,202 additions across 6 files
- All automated checks passing
- Ready for mentor review

Part B: Pywikibot Contributions (New)

Task 6: Optimize Pickle File Storage with Subdirectory Structure (T414087)

- Problem: WikiWho stores 7M+ pickle files in flat directories causing filesystem performance issues
- Solution: Implemented subdirectory structure using floor(page_id/1000)
  - Old: en/100000.p, en/100002.p, ...
  - New: en/100000/100000.p, en/100000/100002.p, ...
- Implementation:
  - Created WikiWhoMixin class in pywikibot/page/_toolforge.py
  - Implemented _get_wikiwho_pickle_path() static method
  - Reduces files per directory from ~7M to ~7K
- Testing:
  - Wrote 7 comprehensive unit tests in tests/wikiwho_tests.py
  - All tests passing (path calculation, subdirectories, edge cases)
  - Passed flake8 code quality checks
- Submission:
  - Created Phabricator ticket T414087
  - Submitted Gerrit patch: https://gerrit.wikimedia.org/r/c/pywikibot/core/+/1224790
  - Addressed code review feedback from Xqt (updated version to 11.0)
  - Patchset 2 uploaded, awaiting approval

Task 7: WikiWho Pickle Compression Investigation (T414075)

- Goal: Investigate if compression can reduce pickle file sizes by 5x or more
- Methodology:
  - Created comprehensive test script (investigate_compression.py)
  - Generated sample WikiWho data (10K tokens, ~757KB)
  - Tested 6 compression methods: gzip (levels 1, 6, 9), bz2 (levels 1, 9), LZMA
- Results: ✓ All methods exceeded 5x target
  - LZMA: 13.77x compression (best ratio)
  - bz2-9: 9.90x compression (best balance)
  - gzip-6: 6.18x compression (recommended)
- Impact Analysis:
  - enwiki (7M articles) storage reduction:
      - Current: ~5.3 TB
    - With gzip-6: ~858 GB (saves 4.4 TB, 84% reduction)
    - With LZMA: ~385 GB (saves 4.9 TB, 93% reduction)
- Deliverables:
  - Created FINDINGS.md with comprehensive analysis
  - Provided implementation code examples
  - Recommended gzip-6 for production use
  - Posted complete findings to T414075

2. Key Accomplishments

- Test Coverage: Achieved comprehensive test coverage for all new direct SQL functionality (37 tests for PendingChangesBot, 7 tests for Pywikibot)
- Multi-Wiki Validation: Confirmed direct SQL approach works across different Wikipedia editions
- Documentation Quality: Created production-ready documentation for future maintainers (DIRECT_SQL_STATISTICS.md + FINDINGS.md)    
- CI/CD Success: Fixed all automated checks, ensuring code quality standards
- Open Source Contribution: Made first upstream Pywikibot contribution via Gerrit code review
- Research Skills: Conducted thorough compression investigation with quantitative analysis
- Knowledge Discovery:
  - Identified which Wikimedia wikis have FlaggedRevs enabled
  - Confirmed compression feasibility for WikiWho pickle files (13.77x achievable)

3. Challenges Faced

Challenge 1: FlaggedRevs Extension Availability

- Problem: Attempted to load statistics for Swedish and English Wikipedia but received "Table doesn't exist" errors
- Root Cause: Not all Wikipedias have the FlaggedRevs extension enabled
- Solution:
  - Researched FlaggedRevs configuration across Wikimedia projects
  - Documented confirmed working wikis (fi, de) and non-working wikis (en, sv)
  - Added troubleshooting section to guide users
  - Identified other likely-supported wikis (pl, ru, cs) based on MediaWiki documentation

Challenge 2: CI Linting Failures

- Problem: Pull request CI checks failing due to 21 linting errors
- Root Cause: Unused imports, line length violations, and SQL injection warnings (false positives)
- Solution:
  - Used Ruff auto-fix to resolve import and formatting issues
  - Added # ruff: noqa: S608 to suppress false-positive SQL injection warnings
  - Manually reformatted long comment lines
  - Verified all tests still pass after fixes

Challenge 3: Gerrit Workflow Learning Curve

- Problem: First time using Gerrit code review system for Pywikibot contributions
- Root Cause: Different workflow from GitHub (requires Change-Id, git-review, rebase workflow)
- Solution:
  - Studied Pywikibot contribution guide (T407059)
  - Installed commit-msg hook for Change-Id generation
  - Learned amend and rebase workflow for updates
  - Successfully submitted patchset and addressed reviewer feedback
  - Quick turnaround on Xqt's version number feedback (10.0 → 11.0)

Challenge 4: Unicode Encoding in Windows Terminal

- Problem: Compression investigation script failed with UnicodeEncodeError when printing results
- Root Cause: Windows cp1252 codec can't encode Unicode checkmark character
- Solution:
  - Script still completed successfully and generated all results
  - Findings documented in FINDINGS.md without encoding issues
  - Learned Windows terminal encoding limitations

4. Learnings and Skills Gained

- Testing Best Practices: Learned Django TestCase patterns, mock usage, and fixture design for database-dependent code
- Wikimedia Architecture: Deepened understanding of FlaggedRevs extension deployment across different Wikipedia editions
- CI/CD Troubleshooting: Gained experience debugging and fixing automated pipeline failures
- Technical Documentation: Practiced writing comprehensive, user-friendly documentation with examples and troubleshooting guides   
- Code Quality Tools: Learned to work with Ruff linter, understand security scanning rules, and make informed decisions about suppressing false positives
- Gerrit Code Review: Mastered Wikimedia's code review workflow (Change-Id, git-review, patchsets, rebase)
- Open Source Collaboration: Learned to respond to maintainer feedback and iterate on contributions
- Research Methodology: Conducted quantitative investigation with controlled experiments and statistical analysis
- Performance Analysis: Learned to benchmark compression algorithms and analyze trade-offs (ratio vs speed)
- Python Standard Library: Deepened knowledge of gzip, bz2, lzma compression modules and pickle protocol

5. Testing & Validation

PendingChangesBot-ng Test Results

✓ WikiReplicaConnection: 14 tests passed
✓ DirectSQLStatisticsClient: 12 tests passed
✓ Management Commands: 11 tests passed
✓ Total: 37 tests, 0 failures

PendingChangesBot-ng Deployment Verification

✓ Finnish Wikipedia (fi): 169 records loaded
✓ German Wikipedia (de): 172 records loaded
✓ Statistics UI displaying correctly
✓ Refresh functionality working

Pywikibot Test Results

✓ WikiWhoMixin pickle path tests: 7 tests passed
✓ All edge cases covered (subdirectory calculation, different languages, path formats)
✓ Code quality checks: flake8 passed (0 errors)
✓ Total: 7 tests, 0 failures

Pywikibot Compression Investigation Results

✓ 6 compression methods tested
✓ All exceeded 5x target (best: 13.77x)
✓ Performance benchmarked (compression + decompression time)
✓ Storage impact calculated for production use

6. Deliverables

PendingChangesBot-ng

- 37 unit tests across 3 test files
- DIRECT_SQL_STATISTICS.md (361 lines)
- Updated README.md
- PR #163 with 1,202 code additions

Pywikibot

- Phabricator T414087 created
- Gerrit patch +1224790 (179 lines: 56 in _toolforge.py, 123 in tests)
- Phabricator T414075 updated with findings
- FINDINGS.md (comprehensive investigation report)
- investigate_compression.py (research script)

7. Links & References

PendingChangesBot-ng:
- Repository: https://github.com/xenacode-art/PendingChangesBot-ng
- PR #163: https://github.com/xenacode-art/PendingChangesBot-ng/pull/163
- Deployment: https://pendingchangesbot.toolforge.org

Pywikibot:
- Gerrit Review: https://gerrit.wikimedia.org/r/c/pywikibot/core/+/1224790
- T414087: https://phabricator.wikimedia.org/T414087
- T414075: https://phabricator.wikimedia.org/T414075
- Repository: https://gerrit.wikimedia.org/r/pywikibot/core

Week 5: January 5, 2026 – January 9, 2026

1. Overview of Tasks Completed

  • Reviewed existing authorship code in Pywikibot (_toolforge.py) to understand how WikiHistory is currently used.
  • Studied the WikiWho Cloud API and identified the Wikipedia languages supported by the public API.
  • Added basic WikiWho API support in Pywikibot with proper checks for site, language, namespace, and page existence.
  • Implemented a minimal helper to fetch raw WikiWho annotations for article revisions without changing existing authorship logic.
  • Refined the implementation to keep the scope limited and suitable for Gerrit review.

2. Key Accomplishments

  • Successfully added foundational WikiWho Cloud API integration to Pywikibot.
  • Ensured WikiWho is only used for supported languages and main namespace pages.
  • Kept the change minimal and non-breaking for existing features.

3. Challenges Faced

  • Scope control: Initial implementation was broader than required; reduced it to match task requirements.
  • Architecture alignment: Ensured all WikiWho logic fits Pywikibot’s Toolforge integration design.

4. Learnings and Skills Gained

  • Better understanding of Pywikibot architecture and Toolforge-based integrations.
  • Experience working with external Wikimedia Cloud APIs.
  • Improved understanding of writing small, review-friendly open-source patches.

5. Additional Notes

  • Focus this week was on correct design and clean integration rather than adding new features.
  • The code is now ready for review under T414071.

Week 6: Mid-Point Progress Report

Outreachy Internship - PendingChangesBot-ng Development

Reporting Period: Weeks 1-5 (December 8, 2025 - January 10, 2026)
Current Status: Week 6 of 13 (January 13-17, 2026)


Executive Summary

This mid-point report evaluates progress against the original internship timeline, documents accomplishments in the first half of the internship, identifies areas where project goals took longer than expected, and proposes a modified plan for the remaining seven weeks.

Key Finding: While significant technical progress has been made across multiple phases, work has been completed out of sequence. Phases 1 (Authentication), 3 (Statistics), and 4 (Pywikibot contributions) are partially or fully complete, while Phase 2 (Bot Control Interface)—the core deliverable—has not been started.


1. Original Internship Project Timeline

As provided by mentor Zache during the contribution period:

Phase Distribution (13 weeks total)

Weeks 1-2: Community Bonding & Onboarding (Dec 2-15, 2025)

  • Understand PendingChangesBot architecture and Django codebase
  • Study Wikimedia OAuth and FlaggedRevs API
  • Set up development and testing environments
  • Deliverable: Fully configured development environment

Weeks 3-5: Phase 1 - Core Authentication & User Management (Dec 16, 2025 - Jan 5, 2026)

  • Implement Wikimedia OAuth 2.0 integration
  • Build user session management and token handling
  • Create permission checking system (admin/reviewer roles)
  • Add audit logging for authentication events
  • Deliverable: Secure OAuth login with wiki usergroup-based permissions

Weeks 6-8: Phase 2 - Bot Control Interface (Jan 6-26, 2026)

  • Build UI for starting/stopping periodic bot reviews
  • Implement manual review triggering with real-time feedback
  • Create settings management interface
  • Add validation and sanity checks
  • Implement audit logging for bot control actions
  • Build dry-run mode and revert functionality
  • Deliverable: Functional control panel for bot operations

Weeks 9-10: Phase 3 - Public Transparency & Statistics (Jan 27 - Feb 9, 2026)

  • Enhance existing statistics pages with improved visualizations
  • Build public logging interface showing bot decisions
  • Create activity timeline and audit trail views
  • Optimize database queries for statistics aggregation
  • Deliverable: Public-facing statistics dashboard

Weeks 11-12: Phase 4 - Upstream Pywikibot Contributions (Feb 10-23, 2026)

  • Work on T408726 (Add FlaggedRevs support to Pywikibot)
  • Implement FlaggedRevs API methods in Pywikibot core
  • Write tests and documentation
  • Refactor PendingChangesBot to use upstream features
  • Deliverable: FlaggedRevs support merged upstream

Week 13: Phase 5 - Deployment & Community Feedback (Feb 24 - Mar 2, 2026)

  • Deploy to production Toolforge
  • Conduct end-to-end testing with real wiki data
  • Create user documentation
  • Gather community feedback
  • Deliverable: Production-ready application with community feedback

2. Goals Met: Actual Progress (Weeks 1-5)

Weeks 1-2: Onboarding - COMPLETED AHEAD OF SCHEDULE

Status: Exceeded expectations

Accomplishments:

  • Development environment configured (local + Toolforge)
  • Built complete Django OAuth boilerplate application
  • Deployed working application: https://django-oauth-erik.toolforge.org
  • Understood Wikimedia OAuth 1.0a authentication flow
  • Discovered and resolved Pywikibot multi-user credential caching issue
  • Migrated to mwclient for thread-safe OAuth operations

Deliverables Beyond Requirements:

  • REST API with Django REST Framework
  • Vue.js frontend with reactive components
  • Multi-language support (English/Finnish i18n)
  • Comprehensive testing pipeline (pytest, bandit, mypy, safety)

Assessment: Original goal was environment setup and understanding. Delivered a production-ready OAuth application with API, frontend, multi-language support, and tests—far exceeding Week 1-2 expectations.


Weeks 3-5: Phase 1 (Authentication) - PARTIALLY COMPLETED

Status: OAuth authentication completed, but permissions/audit logging not implemented

Accomplishments:

  • Wikimedia OAuth 1.0a integration working
  • User session management functional
  • Token handling secure
  • Permission checking system NOT implemented (no admin/reviewer role validation)
  • Audit logging NOT implemented
  • User role database models NOT created

PendingChangesBot Deployment Work (Week 2-3):

  • Deployed PendingChangesBot to https://pendingchangesbot.toolforge.org
  • Fixed static files serving (WhiteNoise)
  • Configured Content Security Policy (django-csp)
  • Fixed database configuration (MariaDB on Toolforge)
  • All 38 migrations applied successfully

Assessment: OAuth login works, but permission system (the critical security component) is missing. Phase 1 is incomplete.


Weeks 3-5: Phase 3 Work (Statistics) - COMPLETED 6 WEEKS EARLY

Status: Completed ahead of schedule

Accomplishments (Week 4-5):

  • Implemented direct SQL access to wiki replica databases
  • Created WikiReplicaConnection with context manager pattern
  • Built DirectSQLStatisticsClient with complex aggregation queries
  • Created Django management command for data loading
  • Successfully loaded 350+ statistics records (Finnish + German Wikipedia)
  • API endpoints functional: /api/flaggedrevs-statistics/ and /api/flaggedrevs-activity/
  • UI visualizations displaying 8 chart types correctly

Documentation:

  • Created comprehensive DIRECT_SQL_STATISTICS.md (361 lines)
  • Updated README with statistics section
  • Documented supported wikis and troubleshooting

Testing:

  • Wrote 37 unit tests covering all direct SQL functionality
  • All tests passing with proper mocking and fixtures

Assessment: Phase 3 work (originally scheduled for Weeks 9-10) is essentially complete. Statistics infrastructure is production-ready.


Weeks 3-5: Phase 4 Work (Pywikibot Contributions) - STARTED 6 WEEKS EARLY

Status: Two contributions submitted, one pending review

Accomplishments (Week 5):

Contribution 1: T414087 - Optimize Pickle File Storage

  • Created WikiWhoMixin class for subdirectory-based pickle storage
  • Reduces files per directory from ~7M to ~7K
  • Wrote 7 comprehensive unit tests
  • Submitted to Gerrit: https://gerrit.wikimedia.org/r/c/pywikibot/core/+/1224790
  • Addressed maintainer feedback (Xqt) - Patchset 2 uploaded
  • Status: Awaiting approval

Contribution 2: T414075 - Compression Investigation

  • Investigated 6 compression methods for WikiWho pickle files
  • Achieved 13.77x compression with LZMA (best), 6.18x with gzip-6 (recommended)
  • Calculated storage impact: 5.3 TB → 858 GB (84% reduction with gzip-6)
  • Created comprehensive FINDINGS.md report
  • Posted complete findings to Phabricator

Assessment: Phase 4 work started successfully, but FlaggedRevs API support (T408726) not yet begun.


Weeks 6-8: Phase 2 (Bot Control Interface) - NOT STARTED

Status: No progress

Missing Components:

  • UI for starting/stopping bot reviews
  • Manual review triggering functionality
  • Settings management interface
  • Dry-run mode
  • Revert functionality
  • Bot control audit logging

Assessment: Phase 2—the core deliverable of the internship—has not been started despite being scheduled for the current period (Weeks 6-8).


3. Comprehensive Accomplishments (Weeks 1-5)

Quantitative Summary

Applications Deployed:

Code Written:

  • 1,202 lines added in PendingChangesBot PR #163
  • 179 lines in Pywikibot patch (56 production + 123 tests)
  • 44 unit tests total (37 for PendingChangesBot + 7 for Pywikibot)

Documentation:

  • DIRECT_SQL_STATISTICS.md (361 lines)
  • FINDINGS.md (compression investigation)
  • 5 weekly progress reports
  • Updated README files

Systems Mastered:

  • Django REST Framework
  • Vue.js 3 (from React background)
  • Django internationalization (i18n)
  • Toolforge deployment and webservices
  • Wiki replica database architecture
  • PyMySQL connection management
  • Gerrit code review workflow

Data Loaded:

  • 169 FlaggedRevs statistics (Finnish Wikipedia)
  • 172 FlaggedRevs statistics (German Wikipedia)
  • 181 review activity records (Finnish)
  • 193 review activity records (German)
  • Total: 715 statistics records across 2 wikis

CI/CD Pipeline:

  • All GitHub Actions checks passing
  • 0 linting errors (fixed 21 Ruff violations)
  • Security scans passing (bandit)
  • Type checking configured (mypy)

4. Project Goals That Took Longer Than Expected

Challenge 1: Multi-User OAuth Authentication (Week 1)

Estimated Time: 2-3 days
Actual Time: 1 week

Why It Took Longer:

  • Unexpected Issue: Pywikibot caches OAuth credentials globally, causing authentication conflicts in multi-user environments
  • Impact: Users would see other users' profiles after logging in
  • Solution Required: Complete library migration from Pywikibot to mwclient
  • Root Cause: Pywikibot designed for single-user CLI scripts, not concurrent web applications

Lesson Learned: Always verify library thread-safety before using in web applications. Global state is incompatible with multi-user contexts.


Challenge 2: Database Configuration (Weeks 2-4)

Estimated Time: 1 day
Actual Time: Recurring issues across 3 weeks

Why It Took Longer:

  • Week 2: Understanding MariaDB + PyMySQL + read_default_file configuration
  • Week 3: Discovered database name mismatch (s57230 vs s57224)
  • Week 4: Fixed DNS resolution for wiki replica hostnames (missing wiki family)

Root Causes:

  1. Invisible errors: Environment variables don't appear in error messages
  2. Toolforge-specific patterns: Different from local development conventions
  3. Multiple databases: Tool database + wiki replicas require different configuration
  4. Read-only constraints: Wiki replica tables need managed=False, but Django migration system still needs migration files

Lesson Learned: Database configuration touches everything—environment variables, credential files, Django settings, ORM routers, migrations, Toolforge permissions. Create comprehensive checklists and test components in isolation.


Challenge 3: Static Files Serving on Toolforge (Week 3)

Estimated Time: "Should just work"
Actual Time: 1 day

Why It Took Longer:

  • Symptom: Vue.js templates showing as raw {{message}} syntax
  • Initial Hypothesis: Vue.js CDN blocked, template syntax error
  • Actual Cause: Django not serving static files at all (CSS/JS returning 404)
  • Solution Required: Install WhiteNoise middleware + configure CSP headers

Root Cause: Django's default static file serving doesn't work on Toolforge. Required production-grade static file handling from Day 1.

Lesson Learned: Test deployment early and often. Development environment behavior != production behavior.


Challenge 4: Understanding Which Wikis Have FlaggedRevs (Week 5)

Estimated Time: "All major Wikipedias"
Actual Time: Trial and error across multiple wikis

Why It Took Longer:

  • Assumption: English and Swedish Wikipedia would have FlaggedRevs (they're major wikis)
  • Reality: English Wikipedia doesn't use FlaggedRevs; Swedish Wikipedia doesn't either
  • Discovery Method: Tried loading statistics, received "Table doesn't exist" errors
  • Solution: Manual testing + reading MediaWiki documentation

Wikis Confirmed:

  • Finnish Wikipedia (fi) - working
  • German Wikipedia (de) - working
  • ❌ English Wikipedia (en) - no FlaggedRevs
  • ❌ Swedish Wikipedia (sv) - no FlaggedRevs

Lesson Learned: Don't assume feature availability based on wiki size. Check documentation and test incrementally.


5. What I Would Do Differently

Strategic Mistake: Working Out of Sequence

What I Did:

  1. Authentication (Weeks 1-2) - completed early
  2. ⏭️ Skipped bot control interface (Phase 2)
  3. Statistics and transparency (Weeks 4-5) - completed 6 weeks early
  4. Pywikibot contributions (Week 5) - started 6 weeks early

Why I Did This:

  • Statistics work felt more concrete and achievable
  • Could see immediate results (data loading, visualizations)
  • Bot control interface felt more abstract (UI design, permissions, workflows)
  • Gravitated toward work that felt "safer"

Impact:

  • Now in Week 6 (when Phase 2 should be wrapping up), but haven't started Phase 2 at all
  • Have completed work from Phases 3 and 4, but missing core deliverable (bot control)
  • Timeline is now inverted—built transparency features before building the bot to control

If Starting Over:

  • Force myself to tackle bot control interface first, even if uncomfortable
  • Resist urge to jump ahead to "easier" tasks
  • Follow the timeline sequence for a reason—each phase builds on previous phases
  • Ask for UI/UX guidance early if design decisions feel overwhelming

Technical Mistake: Building Too Much Too Early (Week 2)

What I Did:

  • Added REST API, Vue.js frontend, multi-language support to boilerplate app
  • "Feature creep" on a learning exercise

Impact:

  • Put me behind schedule by ~3 days
  • Added unnecessary complexity to proof-of-concept

Why I Did It:

  • Wanted to make the boilerplate "production-ready"
  • Excited about learning new technologies (Vue.js, i18n)

Defense:

  • These features taught valuable lessons used later (Vue.js for PendingChangesBot frontend, i18n for Django, testing infrastructure)
  • Sometimes "unnecessary" exploration is valuable learning

If Starting Over:

  • Still worth doing, but timebox the exploration
  • Set a hard deadline: "If not working by Friday, ship what works"

Process Mistake: Not Creating Bot Control Mockups Early

What I Missed:

  • Should have sketched UI mockups for bot control interface in Week 3
  • Could have gotten mentor feedback on design before implementation

Impact:

  • Avoided starting Phase 2 because I didn't know what to build
  • Uncertainty led to procrastination, which led to working on "safer" tasks

If Starting Over:

  • Create lo-fi mockups (even hand-drawn) in Week 3
  • Get mentor approval on design before writing code
  • Break down UI into smallest possible increments (one button at a time)

6. Modified Goals and Timeline for Weeks 6-13

Current Reality Check

Where We Are:

  • Week 6 of 13 (January 13-17, 2026)
  • 7 weeks remaining until internship end (March 2, 2026)

What's Complete:

  • Phase 1 (Authentication) - 70% complete (OAuth works, permissions missing)
  • Phase 2 (Bot Control) - 0% complete
  • Phase 3 (Statistics) - 90% complete (infrastructure done, polish needed)
  • Phase 4 (Pywikibot) - 30% complete (pickle optimization submitted, FlaggedRevs API not started)
  • Phase 5 (Deployment) - 40% complete (deployed to Toolforge, but not production-ready)

Modified Timeline: Weeks 6-13

Weeks 6-7: Bot Control Interface (HIGHEST PRIORITY)

Goal: Build the missing core deliverable

Tasks:

  • Design UI mockups for bot control panel (with mentor feedback)
  • Implement "Start Bot" / "Stop Bot" buttons with backend logic
  • Create manual review triggering form (input: page title/revision ID)
  • Build settings management interface (wiki-specific configurations)
  • Add real-time feedback (AJAX/WebSocket status updates)
  • Implement basic validation (sanity checks before starting bot)

Deliverable: Working control panel where authorized users can start/stop bot and trigger manual reviews

Risk Mitigation: This is the core internship deliverable. If this isn't done, the internship fails to meet its primary goal.


Week 8: Permissions, Authorization, and Audit Logging

Goal: Complete Phase 1 and secure Phase 2

Tasks:

  • Implement permission checking system (query wiki usergroups via MediaWiki API)
  • Create user role database models (Admin, Reviewer, Public)
  • Add authorization checks to bot control endpoints
  • Build audit logging system for all bot control actions
  • Implement revert functionality with permission checks
  • Add comprehensive tests for permission system

Deliverable: Secure bot control with proper authorization and complete audit trail

Dependency: Requires Week 6-7 bot control interface to be functional


Weeks 9-10: Integration, Polish, and Optimization

Goal: Integrate all components and improve user experience

Tasks:

  • Integrate bot control interface with statistics dashboard
  • Enhance statistics visualizations (building on Week 4-5 work)
  • Add filtering and search functionality for logs
  • Implement export functionality (JSON/CSV for statistics)
  • Optimize database queries for performance
  • Build dry-run mode UI (preview bot decisions without executing)
  • Add public logging interface (activity timeline, audit trail)

Deliverable: Cohesive application with integrated control panel and statistics


Weeks 11-12: Pywikibot FlaggedRevs Support (T408726)

Goal: Complete upstream contribution work

Tasks:

  • Study FlaggedRevs API documentation
  • Implement FlaggedRevs API methods in Pywikibot core:
    • Page.flaggedrevs_status() - get pending changes status
    • Page.review() - review a revision
    • FlaggedRevs class for configuration queries
  • Write comprehensive tests following Pywikibot guidelines
  • Submit patches to Gerrit
  • Iterate based on maintainer feedback
  • Refactor PendingChangesBot to use upstream Pywikibot features (if merged)
  • Follow up on T414087 (pickle optimization) merge status

Deliverable: FlaggedRevs support submitted to Pywikibot upstream (merged or pending review with mentor approval)

Alternative Plan: If maintainer review is slow, focus on refactoring PendingChangesBot to use a clean FlaggedRevs abstraction layer that *could* become upstream later.


Week 13: Testing, Documentation, and Community Feedback

Goal: Deliver production-ready application

Tasks:

  • Conduct comprehensive end-to-end testing with real wiki data
  • Test with multiple user roles (admin, reviewer, public)
  • Create user documentation (how to use the control panel)
  • Write operational guide (how to deploy, configure, maintain)
  • Create handoff documentation for future maintainers
  • Gather feedback from Finnish Wikipedia community
  • Address critical bugs and usability issues
  • Document deployment process and system requirements

Deliverable: Production-ready PendingChangesBot-ng with documentation and community feedback

7. Risk Assessment and Mitigation

High Risk: Bot Control Interface Timeline

Risk: Phase 2 requires 3 weeks of work (original timeline), but only 2 weeks allocated in modified plan

Mitigation Strategies:

  1. Scope Reduction: Focus on minimum viable features
    • Start/Stop bot: Required
    • Manual triggering: Required
    • Settings management: MVP only (defer advanced config to post-internship)
    • Dry-run mode: Nice-to-have (defer if time-constrained)
  2. Daily Check-ins: Ask mentor for feedback early and often to avoid rework
  3. Parallel Work: If possible, work on permissions (Week 8 work) in parallel with UI implementation

Medium Risk: Pywikibot Maintainer Review Speed

Risk: T408726 (FlaggedRevs API) requires maintainer review, which may be slow

Mitigation Strategies:

  1. Early Submission: Submit patches by end of Week 11 (not Week 12) to allow time for feedback iteration
  2. Alternative Deliverable: Build clean FlaggedRevs abstraction in PendingChangesBot that could be upstreamed later
  3. Credit for Effort: Outreachy considers submitted patches (pending review) as valid deliverables if mentor approves

Low Risk: Statistics Polish

Risk: Week 9-10 polish work might be lower priority than expected

Mitigation: If running behind on Phase 2, defer statistics polish to post-internship. Statistics infrastructure is already functional.


8. Success Metrics for Second Half

Must-Have (Internship Success Criteria)

  • Bot control interface functional (start/stop/manual trigger)
  • Permissions system working (admin/reviewer roles validated)
  • Audit logging implemented (all bot actions logged)
  • Deployed to Toolforge (production-ready)
  • Documentation complete (user guide + operational guide)

Should-Have (Original Timeline Goals)

  • Dry-run mode working (preview bot decisions)
  • Revert functionality implemented (undo bot actions)
  • Statistics dashboard polished (improved visualizations)
  • Public logging interface (activity timeline for community)

Nice-to-Have (Stretch Goals)

  • FlaggedRevs API merged upstream (T408726 completed)
  • WikiWho pickle optimization merged (T414087 completed)
  • Export functionality (JSON/CSV downloads)
  • Real-time updates (WebSocket status notifications)

9. Lessons for Future Interns

What Worked Well

  1. Building a boilerplate first (Week 1-2) provided valuable hands-on learning
  2. Deploying early (Week 2) revealed production issues before they became critical
  3. Writing comprehensive tests (Week 5) caught bugs and improved code quality
  4. Upstream contributions (Week 5) built credibility and helped the broader community
  5. Detailed weekly reports made it easy to track progress and identify issues

What Didn't Work

  1. Working out of sequence led to missing the core deliverable
  2. Avoiding uncomfortable tasks (UI design) caused procrastination
  3. Not asking for design feedback early led to uncertainty about Phase 2
  4. Assuming timeline was flexible without explicitly discussing with mentor

Advice for Future Interns

  1. Follow the timeline sequence unless you have a good reason not to
  2. When stuck, ask for help immediately (don't spend 3 hours debugging alone)
  3. Deploy early and often (production issues are easier to fix incrementally)
  4. Create mockups before writing UI code (get feedback on design first)
  5. Test on multiple wikis early (don't assume feature availability)
  6. Document as you go (writing docs at the end is harder)

10. Feedback Requested from Mentors

Questions for Zache and Kimmo

  1. Phase 2 Scope:
    • Is the modified 2-week timeline for bot control interface realistic?
    • Should I defer dry-run mode and revert functionality to stretch goals?
    • What's the minimum viable feature set for the control panel?
  1. Permissions System:
    • Should permission checking query MediaWiki API in real-time, or cache usergroups?
    • What happens if a user loses reviewer rights while logged in?
    • Do we need role management UI, or is API-based checking sufficient?
  1. Pywikibot Contributions:
    • Is T414087 (pickle optimization) likely to merge soon?
    • Should I start T408726 (FlaggedRevs API) in Week 11, or focus on PendingChangesBot integration?
    • If maintainer review is slow, is a "submitted patch" sufficient for internship completion?
  1. Modified Timeline Approval:
    • Does the modified plan (Weeks 6-13) align with your expectations?
    • Are there any other priorities I should adjust?

11. Personal Reflection

What I'm Proud Of

Despite working out of order, I've accomplished significant technical work:

  • Two production applications deployed and functional
  • 715 statistics records loaded across 2 wikis
  • 44 unit tests written with comprehensive coverage
  • First upstream Pywikibot contribution submitted to Gerrit
  • Compression investigation with potential 84% storage savings

More importantly, I've learned how to:

  • Navigate unfamiliar codebases and infrastructure
  • Debug production systems remotely
  • Contribute to established open-source projects with maintainers
  • Work through ambiguity and uncertainty
  • Ask for help when stuck

What I've Learned About Myself

  1. I gravitate toward concrete, measurable tasks (SQL queries, data loading) and avoid abstract tasks (UI design, workflow planning)
  2. I need external accountability to tackle uncomfortable work—left to myself, I'll work on "interesting" tasks rather than "required" tasks
  3. I underestimate setup time (database config, deployment, testing) and overestimate implementation time
  4. I learn by breaking things and fixing them, rather than reading documentation upfront

How I'll Improve in the Second Half

  1. Follow the modified timeline religiously (no more jumping ahead to "fun" tasks)
  2. Ask for design feedback before writing code (especially for UI work)
  3. Set daily micro-goals (e.g., "By EOD, start/stop buttons work") to maintain momentum
  4. Communicate blockers immediately rather than working around them
  5. Test incrementally (deploy partial features rather than waiting for "complete" work)

12. Acknowledgments

Mentors:

  • Zache: For patient debugging support, architectural guidance, and identifying the Pywikibot multi-user caching issue
  • Kimmo: For code review feedback and Toolforge deployment assistance

Pywikibot Maintainers:

  • Xqt: For quick feedback on T414087 (pickle optimization patch)

Outreachy Organizers:

  • For the mid-point reflection prompt, which forced me to confront working out of sequence

13. Conclusion

Summary: The first half of the internship produced significant technical accomplishments, but strategic missteps led to working out of sequence. The core deliverable (bot control interface) remains incomplete, while advanced features (statistics, transparency) are mostly done. The modified plan prioritizes completing Phase 2 in Weeks 6-7, implementing permissions in Week 8, and polishing/integrating in Weeks 9-10, with Pywikibot contributions and final deployment in Weeks 11-13.

Repositories:

Live Applications:

Pull Requests:

  • PR #163: Direct SQL Statistics Implementation

Pywikibot Contributions:

Report Prepared By: Erik (Xinacod)
Date: January 15, 2026
Reporting Period: Weeks 1-5 (December 8, 2025 - January 10, 2026)
Next Report: Week 7 (January 20-24, 2026)

Week 6: January 12, 2026 – January 16, 2026

1. Overview of Tasks Completed
  • Tested the newly added WikiWho API integration under different scenarios and inputs.
  • Tried multiple approaches to fetch and validate WikiWho data to ensure reliability.
  • Reviewed Pywikibot source code on GitHub to better understand existing architecture and workflows.
  • Traced how Toolforge-related code is structured and how new helpers fit into the codebase.
2. Key Accomplishments
  • Verified that WikiWho API calls work correctly for supported languages and namespaces.
  • Confirmed that the implementation behaves safely without affecting existing authorship features.
  • Gained clarity on how the new code interacts with Pywikibot’s internal components.
3. Challenges Faced
  • Understanding a large and mature codebase required careful reading and experimentation.
  • Identifying the correct integration points within Pywikibot took time and iteration.
4. Learnings and Skills Gained
  • Deeper understanding of Pywikibot’s code structure and Toolforge integration patterns.
  • Improved ability to read, test, and reason about existing open-source code.
  • Better confidence in validating changes before review.
5. Additional Notes
  • This week focused mainly on testing, exploration, and code understanding rather than new feature development.
  • The goal was to ensure stability and readiness for review.
Week 7: January 18, 2026 – January 22, 2026

     Overview of Tasks Completed

- Implemented Phase 2 (Bot Control Interface) from the internship timeline.
- Created new bot_control Django app with BotStatus model for tracking bot state.
- Built API endpoints for starting, stopping, checking status, and manually triggering reviews.
- Developed background bot runner process that executes review cycles every 30 seconds.
- Designed and implemented Vue.js frontend control panel with real-time status updates.
- Added manual review form with page title, revision ID, and wiki selector inputs.
- Integrated process management with proper logging to stdout/stderr files.

2. Key Accomplishments
- Successfully deployed bot control functionality.
- Built real process control system that actually starts/stops background Python processes (not just database flags).
- Created production-ready UI with auto-refresh every 5 seconds showing live bot status.
- Implemented manual review trigger allowing users to review specific pages on-demand.
- Committed work to GitHub and created PR #165 for upstream review.
- Collaborated with Harshita to debug Toolforge deployment issues (502 Bad Gateway with Superset).

3. Challenges Faced
- Process management on Windows vs Linux: Initial bot runner had platform-specific signal handling issues; resolved by using platform-appropriate process termination methods.
- Subprocess stdout/stderr capture: Background processes weren't logging initially; fixed by redirecting output to dedicated log files.
- Frontend state management: Needed to handle loading states, error messages, and auto-refresh without race conditions; solved with Vue.js reactive data and proper action flags.
- Deployment blockers: Discovered Harshita's 502 errors stem from deprecated Pywikibot SupersetQuery; identified direct SQL as solution (reusing Week 4-5 approach).

4. Learnings and Skills Gained
- Process control in Django: Learned subprocess.Popen for background process management, including platform-specific flags (CREATE_NEW_PROCESS_GROUP on Windows).
- Vue.js 3 Composition API: Built reactive frontend from scratch despite React background, including computed properties, methods, and template syntax.
- API design: Created RESTful endpoints with proper HTTP status codes, error handling, and JSON responses.
- Real-time UI updates: Implemented auto-refresh polling pattern with loading state management.
- Cross-platform development: Handled Windows/Linux differences in process signals (CTRL_BREAK_EVENT vs SIGTERM).
- Collaborative debugging: Analyzed production errors remotely and identified architectural solutions.

5. Feedback and Support Needed
- Awaiting code review on PR #165 from mentors (Zache/Kimmo).
- Need to be added as maintainer to pendingchangesbot tool account on Toolforge to help with deployment.
- Should I proceed with creating a PR to fix the Superset → Direct SQL issue for Harshita?

6. Goals for Next Week
- Address PR #165 review feedback if any.
- Implement fix for Superset connection issues (replace with direct SQL approach).
- Deploy bot control interface to Toolforge once Superset issue is resolved.
- Begin implementing actual FlaggedRevs review logic in bot runner.
- Add permission checking based on wiki usergroups.

7. Additional Notes
- Repository: https://github.com/xenacode-art/PendingChangesBot-ng
- Pull Request: #165 (feature/bot-control-interface)
- Lines of Code: 946 additions across 13 files
- Live Demo: Tested locally at http://localhost:8000/bot-control/
- Focus this week was on building production-ready infrastructure (Phase 2 deliverable) rather than implementing review algorithms - establishing the control system that will enable future review logic implementation.

Week 7: January 19, 2026 – January 23, 2026

1. Overview of Tasks Completed
  • Investigated and fixed authentication issues that were breaking the bot.
  • Replaced Superset-based queries with direct SQL access to wiki replica databases.
  • Set up bot password authentication for MediaWiki API on Toolforge.
  • Deployed and tested recent updates in the production environment.
  • Fixed runtime errors caused by missing imports.
2. Key Accomplishments
  • Restored the bot’s core functionality by switching to a reliable direct SQL approach.
  • Successfully deployed fixes and verified them in production.
  • Confirmed that bot authentication is working correctly.
3. Challenges Faced
  • Superset authentication is not suitable for automated bots.
  • Production code paths and reload behavior required careful verification.
  • Minor runtime errors appeared during deployment and were resolved.
4. Learnings and Skills Gained
  • Better understanding of Toolforge deployment and debugging.
  • Improved handling of authentication and production fixes.
  • Increased confidence in debugging live systems.
5. Additional Notes
  • This week focused on stability and bug fixing rather than new feature work.
  • The bot is now working reliably without depending on Superset.

Week 8: January 26, 2026 – January 30, 2026

                                                                                                                                   
1. Overview of Tasks Completed                                                                                                     

Pywikibot Contributions:
- Resolved merge conflict on Gerrit patch 1224790 (Optimize pickle file storage with subdirectory structure).
- Rebased patch on latest master after upstream WikiWho API changes merged.
- Addressed code review feedback from maintainer Xqt.
- Refactored WikiWho implementation by moving methods from WikiBlameMixin to WikiWhoMixin.
- Added WikiWhoMixin to Page class inheritance.

PendingChangesBot-ng — Phase 1 Permissions System:
- Implemented MediaWiki API integration to query user groups.
- Created role-based permission system (Admin, Reviewer, Public).
- Enforced permissions on bot start/stop, manual review, and settings changes.
- Built permissions API endpoint.
- Deployed to production on Toolforge.

2. Key Accomplishments
- Pywikibot patch ready for final review (patchset 4 uploaded).
- Permissions system fully implemented and deployed.
- Production API live at: https://pendingchangesbot.toolforge.org/bot-control/api/user/permissions/
- PR #166 submitted with complete permissions implementation.
- Collaborated with Harshita on Phase 1 task division (permissions vs audit logging).

3. Challenges Faced
- Merge conflict resolution required understanding upstream WikiWho changes.
- Integrating MediaWiki user groups API with Django permission system.
- Ensuring role enforcement works correctly across all bot control endpoints.

4. Learnings and Skills Gained
- Git rebase workflow for Gerrit patches.
- Python mixin patterns and class inheritance.
- MediaWiki API for user group queries.
- Role-based access control implementation in Django.
- Team coordination and task distribution.

5. Additional Notes
- Pywikibot Gerrit: https://gerrit.wikimedia.org/r/c/pywikibot/core/+/1224790
- PendingChangesBot PR #166: https://github.com/xenacode-art/PendingChangesBot-ng/pull/166
- Harshita is implementing the Audit Logging section to complete Phase 1.
- Will be unavailable Wednesday–Thursday; completed work early.

Week 8: January 26, 2026 – January 30, 2026

1. Overview of Tasks Completed
  • Pulled and integrated pr-166 branch with bot control interface updates.
  • Implemented audit logging middleware for tracking all HTTP requests.
  • Added logging of user identity, role, API endpoint, and request metadata.
  • Ensured middleware integrates with existing permission system.
2. Key Accomplishments
  • Created AuditLogMiddleware that logs every request with timestamp, username, role, method, path, status, and IP.
  • Middleware registered in Django settings and works project-wide.
  • Clean implementation with minimal code (~40 lines).
3. Challenges Faced
  • Initially designed a complex signal-based audit system (tracking database changes).
  • Simplified after feedback to focus on HTTP request logging only.
  • Ensured role detection works correctly for both authenticated and anonymous users.
4. Learnings and Skills Gained
  • Better understanding of Django middleware architecture.
  • Experience with thread-safe request context handling.
  • Learned patterns from production audit logging systems.
5. Additional Notes
  • Audit logs use Python's logging module (logger: audit_log).
  • Log format: AUDIT: timestamp=X | user=X | role=X | method=X | path=X | status=X | ip=X
  • No database storage required - logs go to application logs.

Week 9: February 2, 2026 – February 6, 2026

                                                                                                                            
1. Overview of Tasks Completed                                                                                              

- Implemented Phase 3 (Public Transparency & Statistics) from the internship timeline.
- Enhanced existing statistics pages with improved visualizations.
- Added export functionality to public statistics dashboard (JSON/CSV formats).
- Created comprehensive tests for statistics export functionality.
- Ensured statistics dashboard is publicly accessible without authentication.

2. Key Accomplishments

- Successfully implemented exportable reports for community analysis.
- Added JSON and CSV export options for FlaggedRevs statistics data.
- Created test suite covering export functionality.
- Statistics dashboard now provides transparent view of bot operations for public users.
- Commits merged to feature/public-statistics-dashboard branch.

3. Challenges Faced

- Ensuring export formats are compatible with common data analysis tools.
- Optimizing database queries for statistics aggregation to handle large datasets.
- Filtering sensitive data from public-facing statistics views.

4. Learnings and Skills Gained

- Django file response handling for CSV/JSON exports.
- Writing comprehensive tests for export functionality.
- Best practices for public-facing data transparency features.
- Database query optimization for statistics aggregation.

5. Additional Notes
-https://github.com/Wikimedia-Suomi/PendingChangesBot-ng/pull/168
- Branch: feature/public-statistics-dashboard
- Recent commits: ed7b6df (tests), f16c16e (export functionality)
This comment was removed by Xinacod.

Week 9: February 2, 2026 – February 6, 2026

                                                                                                                                                                 
1. Overview of Tasks Completed                                                                                                                                   
                                                                                                                                                                 
- Built a complete public logging system with database-backed audit trail and bot decision logs.                                                                 
- Created AuditLog and BotDecisionLog models to persist request history and autoreview decisions.                                                                
- Updated AuditLogMiddleware to write to both Python logging and the database.                                                                                   
- Integrated decision persistence into the autoreview pipeline.
- Built public API endpoints with filtering, search, pagination, and CSV/JSON export.                                                                            
- Created a public-facing logs page with a two-tab Vue.js interface (no authentication required).
- Fixed a middleware class name bug in settings.py that prevented the middleware from loading.

2. Key Accomplishments

- Two new models: AuditLog (HTTP request tracking) and BotDecisionLog (autoreview decision records with full check pipeline results stored as JSON).
- Five new API endpoints: Logs page (/bot-control/logs/), audit/decision list APIs with filtering & pagination, and export endpoints (JSON/CSV).
- Frontend: Bulma + Vue.js 3 interface with Activity Log and Bot Decisions tabs, filter controls, color-coded status tags, expandable decision detail rows, and
export dropdown.
- Security: IP addresses stored internally but never exposed in any public API response or export. CSV export includes formula injection protection.
- 47 tests all passing covering models, middleware, APIs, export, and IP exclusion.

3. Challenges Faced

- Discovered settings.py referenced AuditLoggingMiddleware instead of AuditLogMiddleware — the Week 8 middleware was never actually running. Fixed as part of
this work.
- Found a stale .pyc migration artifact from a previous abandoned attempt that needed cleanup.
- Template tests required @override_settings for WhiteNoise compatibility, consistent with existing codebase patterns.
- Cross-app imports between reviews and bot_control required lazy imports to avoid circular dependencies.

4. Learnings and Skills Gained

- Building paginated public APIs with Django's Paginator (without REST Framework).
- Using StreamingHttpResponse for memory-efficient CSV exports.
- CSV injection prevention techniques for exported data.
- Designing APIs where sensitive data (IP addresses) is stored but never leaked publicly.

5. Additional Notes

- Logs page is publicly accessible at /bot-control/logs/ with no authentication required.
- BotDecisionLog stores full check pipeline output as JSON, enabling drill-down into individual check results.
- Both models registered in Django admin for internal access.
- Pre-existing test failures (WhiteNoise template errors + statistics refresh 502) remain unchanged and are unrelated to this work.

Week 10: February 9, 2026 – February 13, 2026

                                                                                                                            
1. Overview of Tasks Completed                                                                                                 
- Implemented Phase 3 (Integration & UX) from the internship timeline.                                                      
- Connected bot control interface to the statistics dashboard.                                                              
- Added bot status widget showing running/stopped state with auto-refresh.
- Created BotActivity model to track and display review decisions.
- Built API endpoints for bot activity summary and individual activity records.
- Integrated autoreview logging to track approved/blocked/manual decisions.
- Improved chart visualizations with better tooltips, colors, and animations.
- Added database indexes for FlaggedRevsStatistics and ReviewActivity tables.
- Optimized export queries using .only() for better performance.

2. Key Accomplishments

- Bot status now visible directly on statistics page with live updates every 10 seconds.
- Bot activity summary displays total reviews, approved, blocked, and manual counts.
- Charts enhanced with smooth animations, improved hover effects, and better legends.
- Database performance improved with new indexes on frequently queried fields.
- All changes committed and pushed to PR #168.

3. Challenges Faced

- Coordinating task division with Harshita (logs vs statistics work).
- Ensuring bot activity logging doesn't break autoreview if database write fails.
- Balancing chart aesthetics with information density.

4. Learnings and Skills Gained

- Django model design for activity tracking with proper indexes.
- Chart.js advanced configuration (tooltips, animations, interaction modes).
- Vue.js state management for real-time status updates.
- Database query optimization techniques in Django ORM.

5. Additional Notes

- Repository: https://github.com/Wikimedia-Suomi/PendingChangesBot-ng
- Pull Request: #168 (feature/public-statistics-dashboard)