Page MenuHomePhabricator

Wikifile-Transfer Enhancement
Closed, DeclinedPublic

Description

Profile Information

Phaneeth Kumar
PhaneethKumar
https://github.com/PhaneethKumar
India (UTC +5:30)
Monday to Friday : 9:00 AM – 5:00 PM IST (UTC +5:30) + Flexible(I can extend my work time in some cases)

Synopsis

  • Wikifile-Transfer is a Toolforge web application that enables Wikimedia contributors to transfer media files—particularly non-free and fair-use images—between different wiki projects. While the tool serves an important workflow for contributors, it currently handles only single-file transfers, lacks any history tracking, and has limited metadata handling capabilities. These gaps make the tool less practical for contributors who routinely transfer multiple files and need accountability over their past transfers.

This project proposes four focused enhancements:
Batch Upload Capability — Allow contributors to queue and transfer multiple files in a single session using a Celery-backed asynchronous task queue with Redis as the message broker. Users will be able to monitor real-time progress through a React-based frontend interface.
Upload History System — Implement a persistent history of all file transfers per user, stored in MySQL via SQLAlchemy models. This will enable contributors to review, audit, and reference their past uploads without relying on external records.
Improved Metadata Extraction with Category Localization — Enhance the existing metadata extraction pipeline to correctly identify and map categories across different wiki language editions, ensuring that transferred files land in contextually appropriate categories on the destination wiki.
Comprehensive Test Coverage — Write unit tests and integration tests for both backend (Flask/Python) and frontend (JavaScript/React) components, establishing a reliable baseline for future contributions to the project.

Deliverables

Community Bonding Period (May 1 – May 26)
The first priority will be getting deeply familiar with the existing codebase before writing a single line of new code. I will set up the full local development environment using Docker, read through the Flask application structure, trace how a single file transfer currently works end-to-end, and identify all the integration points where the new features will need to hook in. I will also establish a regular communication rhythm with my mentors on IRC and Phabricator, agree on a code review workflow, and get feedback on my proposed database schema and architecture before proceeding.
Milestone: Development environment running locally; architecture document and schema draft reviewed by mentors.

Week 1–2 (May 27 – June 7): Investigation & Architecture Design
With the codebase exploration from bonding period complete, I will spend these two weeks designing the technical architecture for all four features before implementation begins. This includes finalizing the SQLAlchemy models for the upload history table, designing the Celery task structure for batch processing, identifying which metadata fields need localization handling, and planning the React component hierarchy for the new frontend views. I will document all design decisions in a shared draft and seek mentor sign-off before moving to code.
Deliverable: Finalized architecture document, database migration scripts drafted, mentor approval to proceed.

Week 3–4 (June 8 – June 21): Batch Upload — Backend
I will implement the server-side of batch upload during these two weeks. This involves:

Integrating Celery with the existing Flask application and configuring Redis as the task broker
Writing a Celery task that wraps the existing single-file transfer logic and handles it asynchronously
Creating REST API endpoints for submitting a batch job, polling task status, and retrieving results
Implementing error handling and retry logic so that a failure on one file in a batch does not abort the remaining transfers

Deliverable: Working backend batch upload API with Celery tasks; tested manually with sample file batches.

Week 5–6 (June 22 – July 5): Batch Upload — Frontend
With the backend in place, I will build the React frontend for batch uploads:

A file selection interface allowing contributors to add multiple files to a transfer queue
A real-time progress display component that polls the Celery task status endpoint and shows per-file progress
Clear error display for partially failed batches, distinguishing which files succeeded and which failed
Integration with the existing application UI to ensure a consistent look and feel

Deliverable: Functional end-to-end batch upload flow usable in the local development environment.

Week 7 (July 6 – July 12): Midterm Buffer & Review
This week is reserved for addressing mentor code review feedback from weeks 3–6, fixing any bugs surfaced during testing, and writing initial documentation for the batch upload feature. I will also submit the midterm evaluation during this period.
Deliverable: Batch upload feature in a reviewable, mergeable state; midterm evaluation submitted.

Week 8–9 (July 13 – July 26): Upload History System
I will implement persistent upload history tracking:

SQLAlchemy model for a transfer_history table capturing file name, source wiki, destination wiki, transfer status, timestamp, and the Wikimedia username of the contributor
Database migration script compatible with the existing MySQL setup on Toolforge
Backend API endpoints to retrieve a user's history with basic filtering (by date range, status, destination wiki)
A React frontend component displaying the history in a sortable, paginated table view

Deliverable: Upload history feature working end-to-end; user can view their full transfer log from the UI.

Week 10–11 (July 27 – August 9): Metadata Extraction & Category Localization
I will enhance the metadata extraction pipeline:

Audit current metadata extraction to identify which fields are missing or incorrectly handled for non-English wikis
Implement category localization logic that maps categories from the source wiki to their equivalent categories on the destination wiki using the Wikimedia API where available, and falling back gracefully when no equivalent exists
Ensure that transferred files receive accurate, project-appropriate categories on the destination wiki rather than raw source categories

Deliverable: Improved metadata extraction handling multiple language wikis correctly; tested with a representative set of source/destination wiki combinations.

Week 12 (August 10 – August 19): Testing & Documentation
I will write comprehensive tests covering all features built during the program:

Python unit tests for Flask routes, SQLAlchemy models, Celery tasks, and metadata extraction functions using pytest
JavaScript tests for React components
Integration tests covering the full transfer flow (single and batch)
Updated README with setup instructions, feature documentation, and contribution guide for future developers

Deliverable: Test suite passing in CI; complete documentation committed to the repository.

Week 13 (August 20 – August 25): Final Submission & Wrap-Up
I will address any remaining mentor feedback, do a final review of all pull requests, and submit the final GSoC evaluation. I will also write a brief project summary blog post on my Wikimedia user page to help future contributors understand the work done.
Deliverable: All PRs merged or in final review; final evaluation submitted.

Participation

Communication: I will post weekly progress updates on the Phabricator task (T415562) every Friday and will be active on the #wikimedia-dev IRC channel on libera.chat during my working hours (9 AM–5 PM IST). For detailed technical discussions I will use Phabricator task comments. I will flag blockers to my mentors within 24 hours of encountering them rather than waiting for the weekly update.
Code: All code will be developed in a public fork of the Wikifile-Transfer repository on GitHub under my account (github.com/PhaneethKumar). I will submit work as individual pull requests mapped to each feature, keeping PRs small and focused to make code review manageable.
Reviews: I will actively seek mentor feedback after each week's work and will not move to the next phase until the current one has been reviewed and approved.

About Me

Tell us about a few:
I am currently pursuing a Bachelor of Technology in Computer Science at Madanapalle Institute of Technology & Science, India. I heard about Google Summer of Code through social media and was drawn to Wikimedia as an organization because of its mission to make knowledge freely accessible to everyone — a mission that resonates with me as someone who has benefited from open knowledge throughout my own learning journey.
This particular project stood out to me because it maps directly onto the skills I have been building — Python backend development, React-based frontends, and working with databases — and represents a genuine, practical improvement to a tool that real contributors use. Beyond the technical work, I see this as an opportunity to establish myself as a long-term contributor to the Wikimedia ecosystem, learn how professional open-source software development works in practice, and grow under the guidance of experienced mentors. I want to demonstrate that I can take ownership of a non-trivial software project from design through to delivery, and this is the right project to do that with.
I have no significant time conflicts during the GSoC period (May–August). My academic semester will have concluded before the program begins, so I will be able to commit to this project full-time. I am applying only to Wikimedia through GSoC and to no other organizations.

Past Experience

Wikimedia Contributions: I have not yet made contributions to Wikimedia repositories during the application phase.

Personal Projects:
E-Commerce Website (HTML, CSS, JavaScript)
A front-end web application demonstrating a simple authentication system paired with a product listing interface. The project includes user signup and login functionality with client-side form validation, and a responsive product grid. User data is persisted via browser localStorage; session state is managed with sessionStorage and automatically cleared on session end. Access control is enforced by checking a loggedInUser key in sessionStorage before rendering protected pages. This project taught me how to think about application state, user flow, and the separation of concerns in a frontend-only context — and it made me aware of the significant limitations of client-side-only storage, which is one reason I am interested in building the server-side upload history system for this project.
Netflix UI Clone (HTML, CSS)
A practice project focused on mastering CSS Grid and Flexbox for building responsive layouts that adapt across screen sizes. Working through this project gave me a solid intuitive grasp of modern CSS layout systems, which informs how I approach frontend component design today.
Open Source Contributions: I have not yet made open-source contributions outside of the application phase work described above. I chose to be direct about this rather than overstate my background. My goal through GSoC is to establish exactly the kind of sustained open-source contribution record that I currently lack, starting with this project.

Any Other Info

Why I am a viable candidate despite limited prior open-source history: The skills required for this project — Flask, SQLAlchemy, Celery, React, MySQL, Docker, Redis — are the skills I have been actively building through coursework & self-directed projects and those which I want to learn in future. I am at a stage in my learning where I have the foundational knowledge to contribute meaningfully but need the structure of a mentored program to make my first substantial open-source contribution. GSoC with Wikimedia is precisely the right environment for that transition.
Planned first steps before coding begins: Before writing any new feature code, I will spend the community bonding period fully reading through the Wikifile-Transfer source, running the existing test suite, understanding the Toolforge deployment environment, and mapping out exactly where each proposed feature will integrate with the existing codebase.

Event Timeline

Gopavasanth subscribed.

Hi, thank you for your submission and the effort you put into your proposal. This year we received over 380 strong applications, and unfortunately we were not able to offer you a slot. This was a very competitive process, and many high quality proposals could not be selected. We truly encourage you to stay engaged and continue contributing to Wikimedia projects. Over the years, many contributors who were not selected for Google Summer of Code have gone on to make impactful contributions and become long term members of the community. Please do not see this as a failure, but as a step forward in your journey. We would love to stay in touch and support your continued involvement.

If you would like guidance on how to contribute to our projects outside GSoC, feel free to reach out to any of the mentors or org admins, they will be happy to help you get started.

You can get started or continue contributing here:

We hope to see your contributions in our community soon.