Page MenuHomePhabricator

GSoC 2026 Proposal — Wikifile-Transfer Enhancement (Kaja Obinna Erik)
Closed, DeclinedPublic

Description

Wikifile-Transfer: Batch Upload, History, Metadata Extraction & Testing

Project task: T415562
Organization: Wikimedia Foundation (Indic-TechCom)
Mentors: @ParasharSarthak @Jnanaranjan_sahu
Project Size: Large (~350 hours)

Contact Information

Name: Kaja Obinna Erik
Phabricator: Xinacod
GitHub: xenacode-art
University: Landmark University, Nigeria
Timezone: WAT (UTC+1)

Project Summary
Wikifile-Transfer is a Toolforge application that enables contributors to transfer media files across Wikimedia projects. Currently it supports only single-file transfers and lacks history tracking, metadata consistency, and adequate test coverage.
This project aims to introduce batch upload functionality, a full upload history system with retry capability, improved metadata extraction with category localization, and a comprehensive test suite. These enhancements will make the tool significantly more efficient, reliable, and scalable for the Wikimedia contributor community.

Problem Statement


Wikifile-Transfer currently has several critical limitations:

- Contributors must transfer files one at a time, making large-scale workflows inefficient
- There is no history or audit log — failed uploads must be restarted from scratch
- Categories are not localized, requiring manual cleanup after each transfer
- Missing test coverage makes the codebase risky to extend and hard to maintain
- No CI/CD pipeline means regressions can slip through undetected

These issues are especially impactful for contributors during edit-a-thons, workshops, and large content migration drives.



Proposed Solution

1. Batch Upload System
- Accept multiple file URLs or file IDs in a single workflow
- Use Celery task queues for asynchronous processing
- Expose per-file progress and status via a polling API endpoint
- Support partial failure: completed files are not re-uploaded on retry

2. Upload History System
- Store transfer records in a SQLAlchemy model (file, status, timestamp, error)
- Dashboard UI with filtering by status (pending, success, failed) and date range
- One-click retry for failed uploads, using stored metadata

3. Metadata Extraction & Category Localization
- Extract structured metadata from source file descriptions
- Map categories across Wikimedia language editions via the MediaWiki API
- Gracefully fall back to original category when no mapping exists

4. Testing & CI/CD
- Unit tests with pytest for all new and existing core functions
- Integration tests covering the full upload flow end-to-end
- GitHub Actions CI pipeline running tests on every PR
- Target: 80%+ backend test coverage



Goals / Expected Outcomes

- Enable batch transfer of multiple files in a single workflow
- Provide a complete upload history dashboard with retry support
- Improve metadata consistency through automatic category localization
- Achieve 80%+ backend test coverage with a stable CI/CD pipeline
- Leave a well-documented, maintainable codebase for future contributors



Pre-GSoC Contributions

I have already made three  pull requests to Wikifile-Transfer to familiarise myself with the codebase:

- PR  #54 — Fix bare `except` clauses and file handle resource leaks
- PR #55 — Pin dependencies, fix Dockerfile base image, move Celery broker URL to environment variable
- PR #64 — Fix URL validation crashes, add request timeouts, return consistent error responses

 

About Me

I am a software engineering student at Landmark University, Nigeria. I have been contributing to open-source projects for over 2 months, with a focus on Python backend development, REST APIs, and developer tooling. I am familiar with Django, Flask, Celery, and SQLAlchemy. I have studied the Wikifile-Transfer codebase thoroughly and have discussed implementation approaches with the mentors.



Why This Project

I chose Wikifile-Transfer because I believe tools that lower the barrier for Wikimedia content contribution have an outsized impact. The limitations I identified while exploring the codebase — no batch support, no history, no tests — are exactly the kinds of infrastructure gaps I am well-positioned to fix. My pre-GSoC PRs demonstrate that I can navigate the codebase, engage with maintainer feedback, and deliver working code.    



Timeline

- Phase 1 (Weeks 1–4): Batch upload system — Celery queue, per-file status API
- Phase 2 (Weeks 5–8):Upload history dashboard — models, UI, retry support
- Phase 3 (Weeks 9–11): Metadata extraction and category localization
- Phase 4 (Weeks 12–14): Testing, CI/CD, documentation, and buffer


Communication

- Active on Phabricator and Wikimedia Zulip
- Weekly written progress updates
- Incremental PRs with thorough commit messages and review responses

Post-GSoC

I plan to continue maintaining Wikifile-Transfer after GSoC, reviewing community PRs and extending features based on contributor feedback.

Event Timeline

Gopavasanth subscribed.

Hi, thank you for your submission and the effort you put into your proposal. This year we received over 380 strong applications, and unfortunately we were not able to offer you a slot. This was a very competitive process, and many high quality proposals could not be selected. We truly encourage you to stay engaged and continue contributing to Wikimedia projects. Over the years, many contributors who were not selected for Google Summer of Code have gone on to make impactful contributions and become long term members of the community. Please do not see this as a failure, but as a step forward in your journey. We would love to stay in touch and support your continued involvement.

If you would like guidance on how to contribute to our projects outside GSoC, feel free to reach out to any of the mentors or org admins, they will be happy to help you get started.

You can get started or continue contributing here:

We hope to see your contributions in our community soon.