Wikifile-Transfer: Batch Upload, History, Metadata Extraction & Testing
Project task: T415562 Organization: Wikimedia Foundation (Indic-TechCom) Mentors: @ParasharSarthak @Jnanaranjan_sahu Project Size: Large (~350 hours) Contact Information Name: Kaja Obinna Erik Phabricator: Xinacod GitHub: xenacode-art University: Landmark University, Nigeria Timezone: WAT (UTC+1)
Project Summary
Wikifile-Transfer is a Toolforge application that enables contributors to transfer media files across Wikimedia projects. Currently it supports only single-file transfers and lacks history tracking, metadata consistency, and adequate test coverage.
This project aims to introduce batch upload functionality, a full upload history system with retry capability, improved metadata extraction with category localization, and a comprehensive test suite. These enhancements will make the tool significantly more efficient, reliable, and scalable for the Wikimedia contributor community.
Problem Statement Wikifile-Transfer currently has several critical limitations: - Contributors must transfer files one at a time, making large-scale workflows inefficient - There is no history or audit log — failed uploads must be restarted from scratch - Categories are not localized, requiring manual cleanup after each transfer - Missing test coverage makes the codebase risky to extend and hard to maintain - No CI/CD pipeline means regressions can slip through undetected These issues are especially impactful for contributors during edit-a-thons, workshops, and large content migration drives. Proposed Solution 1. Batch Upload System - Accept multiple file URLs or file IDs in a single workflow - Use Celery task queues for asynchronous processing - Expose per-file progress and status via a polling API endpoint - Support partial failure: completed files are not re-uploaded on retry 2. Upload History System - Store transfer records in a SQLAlchemy model (file, status, timestamp, error) - Dashboard UI with filtering by status (pending, success, failed) and date range - One-click retry for failed uploads, using stored metadata 3. Metadata Extraction & Category Localization - Extract structured metadata from source file descriptions - Map categories across Wikimedia language editions via the MediaWiki API - Gracefully fall back to original category when no mapping exists 4. Testing & CI/CD - Unit tests with pytest for all new and existing core functions - Integration tests covering the full upload flow end-to-end - GitHub Actions CI pipeline running tests on every PR - Target: 80%+ backend test coverage Goals / Expected Outcomes - Enable batch transfer of multiple files in a single workflow - Provide a complete upload history dashboard with retry support - Improve metadata consistency through automatic category localization - Achieve 80%+ backend test coverage with a stable CI/CD pipeline - Leave a well-documented, maintainable codebase for future contributors Pre-GSoC Contributions I have already made three pull requests to Wikifile-Transfer to familiarise myself with the codebase: - PR #54 — Fix bare `except` clauses and file handle resource leaks - PR #55 — Pin dependencies, fix Dockerfile base image, move Celery broker URL to environment variable - PR #64 — Fix URL validation crashes, add request timeouts, return consistent error responses About Me I am a software engineering student at Landmark University, Nigeria. I have been contributing to open-source projects for over 2 months, with a focus on Python backend development, REST APIs, and developer tooling. I am familiar with Django, Flask, Celery, and SQLAlchemy. I have studied the Wikifile-Transfer codebase thoroughly and have discussed implementation approaches with the mentors. Why This Project I chose Wikifile-Transfer because I believe tools that lower the barrier for Wikimedia content contribution have an outsized impact. The limitations I identified while exploring the codebase — no batch support, no history, no tests — are exactly the kinds of infrastructure gaps I am well-positioned to fix. My pre-GSoC PRs demonstrate that I can navigate the codebase, engage with maintainer feedback, and deliver working code. Timeline - Phase 1 (Weeks 1–4): Batch upload system — Celery queue, per-file status API - Phase 2 (Weeks 5–8):Upload history dashboard — models, UI, retry support - Phase 3 (Weeks 9–11): Metadata extraction and category localization - Phase 4 (Weeks 12–14): Testing, CI/CD, documentation, and buffer Communication - Active on Phabricator and Wikimedia Zulip - Weekly written progress updates - Incremental PRs with thorough commit messages and review responses Post-GSoC
I plan to continue maintaining Wikifile-Transfer after GSoC, reviewing community PRs and extending features based on contributor feedback.