Page MenuHomePhabricator

GSoC 2026: Wikifile-Transfer Enhancement
Open, Needs TriagePublic

Description

Project Title

Wikifile-Transfer: Batch Upload, History, Metadata Extraction & Testing

Brief Summary

Wikifile-Transfer is a Toolforge web application that helps Wikimedia contributors transfer media files (especially non-free/fair-use images) between different wiki projects. This project aims to enhance the tool by adding batch upload capability, implementing an upload history system, improving metadata extraction with category localization, and adding comprehensive test coverage to ensure code quality and reliability.

Expected Outcomes

  • Users can transfer multiple files in a single batch operation instead of one-by-one
  • Complete upload history dashboard with retry functionality for failed transfers
  • Automatic category localization during file transfer (not just templates)
  • 80%+ backend test coverage with pytest and E2E tests with Cypress
  • CI/CD pipeline with GitHub Actions for automated testing

Background

The tool was created in 2019 and upgraded to v2 in November 2024. It currently:

  • Supports multiple Wikimedia sister projects
  • Provides UI in 30+ languages (via i18n)
  • Automatically localizes licensing templates during transfer
  • Reduces manual transfer time from minutes to seconds

Technical Stack

ComponentTechnology
BackendPython 3.11, Flask, SQLAlchemy
FrontendReact 18, Material-UI 6
DatabaseMySQL
Task QueueCelery + Redis
Testingpytest, Cypress
CI/CDGitHub Actions

Skills Required/Preferred

Required:

  • Python (Flask, SQLAlchemy, Celery)
  • JavaScript/React (functional components, hooks)
  • SQL basics (MySQL)
  • Git version control
  • Docker
  • Redis

Preferred:

  • MediaWiki API
  • Cypress testing framework

Phabricator Project Tags: Indic-TechCom

Possible Mentors

Expected Size of the Project

350 hours

Rating

Medium

Additional Information for Contributors

Getting Started:

  1. Try using the live tool at https://wikifile-transfer.toolforge.org/
  2. Set up local development environment using Docker
  3. Read the codebase

Why Are You Proposing This Project?

This project is proposed to address real needs identified by the Wikimedia community, particularly Indian language wiki contributors.

Problems we're solving:

  1. Repetitive manual work: Contributors transferring multiple files must repeat the entire process for each file, wasting significant time
  2. No transfer tracking: Users have no way to see their past uploads or retry failed transfers
  3. Lost metadata: When files are transferred, categories are lost or remain in the source language, requiring manual fixes
  4. Code maintainability: Zero test coverage makes it risky to add new features or fix bugs

Who benefits:

  • Indic language wiki communities (Hindi, Tamil, Bengali, etc.)
  • Any Wikimedia contributor working with non-free media across projects
  • Tool maintainers who need reliable, tested code

What Is the Expected Impact?

Success looks like:

  • Contributors can transfer 10-50 files in the time it currently takes to transfer 1 file
  • Failed transfers can be retried with one click instead of starting over
  • Categories are automatically localized, reducing manual post-transfer cleanup by 80%
  • New features can be added confidently with test coverage preventing regressions

Community impact:

  • Reduces barrier for non-English wiki contributors
  • Saves hundreds of volunteer hours annually
  • Makes the tool more reliable and maintainable for future development

Microtasks

This task is part of Google Summer of Code 2026. Please do not claim, self-assign, or start working on this task before the official GSoC contribution timeline.


References

Tool & Documentation:

MediaWiki API:

Testing Frameworks:

Related Tools:

IMPORTANT: Please do not submit micro task patches before March 16. The official contribution period starts on March 16. You may prepare and discuss ideas with your mentors until then.

Related Objects

StatusSubtypeAssignedTask
OpenLGoto
OpenAnirudh23090
OpenNone
OpenNone
DeclinedAdeel-Tahir-developer
OpenAnirudh23090
DeclinedGiggs_Ebuka
DeclinedViserion7
DeclinedAvalanche_ag
DeclinedKrushna-Pisal
DeclinedRyavrma
DeclinedAnkushx01-dev
DeclinedXinacod
DeclinedPhaneethKumar
DeclinedRohan_salunke69
DeclinedDev
DeclinedSajaljain0409
DeclinedSunkireddyBarath

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@YahiaHamdan1 Feel free to prepare and discuss, but please wait until March 16 to submit any code. @ParasharSarthak and @Jnanaranjan_sahu please comment on this task with any specific preferences for contributions. Thanks!

Hi @ParasharSarthak, @Jnanaranjan_sahu

I’m a Computer Science student interested in contributing to the Wikifile-Transfer project for GSoC 2026.

I’m currently exploring the repository and setting up the local development environment using Docker so I can understand the architecture before the contribution period starts.

Over the next few days I plan to study the backend upload flow and the Celery task handling to prepare for the microtasks when the contribution window opens on March 16.

Please let me know if there are any specific areas of the codebase you would recommend reviewing beforehand.

Looking forward to contributing!

Hi @ParasharSarthak and @Jnanaranjan_sahu,

I’ve successfully set up the local Docker environment and currently exploring the codebase.

Since we are in the preparation phase, I'd love to discuss the frontend architecture for the Batch Upload feature. Is it okay if I share a quick React/UI mockup here in this thread next week to get your feedback on the user flow before March 16?

Looking forward to it.

Hi everyone,

Yesterday while setting up this tool locally i have encountered a JSONDecodeError due to missing User-Agent headers. i have submitted a fix on GitHub that attaches the headers to backend API requests.

Please let me know if you want any changes.

Thank you.

Hello @ParasharSarthak and @Jnanaranjan_sahu. I'm Riya, a first-year MS Computer Science student at USC planning to apply for GSoC 2026 with Wikimedia. I'm currently exploring the repository and setting up the development environment locally with Docker. Over the next few days I’ll focus on understanding the codebase and testing changes locally, and I plan to start contributing PRs once the contribution period begins on March 16.

Hi @ParasharSarthak, @Jnanaranjan_sahu

I’m a CS student and a tech enthusiast interested in contributing to the Wikifile-Transfer project for GSoC 2026 & I'm also a Wikimedia heritage lens fellow member, ;

I’m currently setting up the project locally and looking onto the structure getting hands on it ;
Next, I plan to study the backend flow and task handling to prepare for the microtasks I embrace to work on ;
Hopes for the best to be part of it ;

Hi again — quick update on what I've been working on since my last message.

Opened two more PRs  today:

- PR #54 — fixes bare except: clauses in app.py and utils.py (replaced with except Exception:), and wraps the open() calls in utils.py and tasks.py in with statements so file handles aren't leaked if a request fails:
https://github.com/indictechcom/wikifile-transfer/pull/54
- PR #55 — pins all dependencies in requirements.txt to tested versions, switches Dockerfile-api and Dockerfile-worker from  python:latest to python:3.12-slim for reproducible builds, and replaces the hardcoded PRODUCTION = False flag in celeryWorker.py with an environment variable: https://github.com/indictechcom/wikifile-transfer/pull/55

Still keen to work on the test suite and batch upload feature for the GSoC proposal. Happy to start on tests next if that's the most useful direction — just let me know.

Hello @ParasharSarthak @Jnanaranjan_sahu and @LGoto,

I'm Oppong Kwabena, a first year Computer Science student at KNUST, Ghana. I'm interested in learning a lot from my mentors and contributing to the Wikifile-Transfer Enhancement Project for the Google Summer of Code 2026. I'm currently setting up my local development environment using Docker. I'm also going through the repository to get more insight on the architecture, back-end flow and the microtasks.

Hello @ParasharSarthak @Jnanaranjan_sahu
I'm Lovette, a Python, Golang, and Typescript developer. I've built end-to-end Flask applications using the necessary tools for this project, and I have good mastery of Flask, SQLAlchemy, Typescript, React, SQL, and writing API tests with pytest. I'm very excited to contribute to this project for GSoC2026. I'm currently setting up the project locally to explore it, and I look forward to working with you to bring it to completion.

Hi @ParasharSarthak @Jnanaranjan_sahu
I’ve worked on fixing the AxiosError (500) during file uploads and have opened a PR for it The fix mainly adds proper validation and structured error handling across the upload pipeline (/api/upload, utils, and async task), preventing backend crashes and returning consistent JSON responses instead of raw 500 errors
I’d really appreciate any feedback or suggestions for improvement Thanks!

This comment was removed by K719.

Hi @ParasharSarthak @Jnanaranjan_sahu and everyone,
I came across this project under GSoC 2026 yesterday and became really interested in contributing. This is my first time working on an open-source project, but I have experience building web applications through internships and coursework, using a similar tech stack (Flask for the backend and Next.js for the frontend).
I’ve already set up the project locally and am planning to start with some microtasks over the next few days. Looking forward to learning from the community!

Removed the previous comment by mistake

Hi @ParasharSarthak and @Jnanaranjan_sahu,

I have officially submitted my GSoC 2026 proposal for the Wikifile-Transfer Enhancement project via the contributor portal!

My proposal outlines a 350-hour plan to implement:

  • Batch Upload Pipeline (using Celery chords)
  • Upload History Dashboard (with one-click retry logic)
  • Automatic Category Localization (via MediaWiki Langlinks API)
  • Comprehensive Test Coverage (80%+ backend coverage with Pytest and Cypress E2E tests)

My prior work on PR #58 (resolving the orphaned file cleanup in T415717) gave me a solid understanding of the current Flask, Celery, and filesystem architecture, and I am excited about the prospect of bringing these enhancements to the Indic-TechCom community.

For easy access outside the GSoC portal, I have also uploaded my proposal PDF here: https://drive.google.com/file/d/1xw13E85eb2SVycsN10YQQV7kUTI8bwVG/view?usp=sharing

@ParasharSarthak @Jnanaranjan_sahu I have opened PR #61 for T415715 and T415717. It covers the backend error-handling improvements and temp-file cleanup work: standardized JSON error responses, removal of silent/broad exception handling, structured logging, and deterministic cleanup for uploaded temporary files. I have explained the details in the PR; please review it and share your feedback.

Hey @ParasharSarthak and @Jnanaranjan_sahu,

Now that the contribution window is officially open, I've gone ahead and submitted my pull requests for the Wikifile-Transfer project!

I've opened a PR for the main Temp File Cleanup microtask (T415717), wrapping the file handling in context managers and try...finally blocks to stop the resource leaks and guarantee the temporary images get deleted properly even if an upload fails.

While I was testing the backend locally over the last couple of weeks, I also tracked down the root causes for a couple of older API crashes (GitHub Issues #40 and #42) involving Wikimedia Commons URLs and missing 502/404 target files. I went ahead and bundled those fixes into a second PR as well.

I'd love to get your feedback on the code whenever you have a chance to review it. I'm around and happy to make any tweaks or adjustments you need!

Hi @ParasharSarthak  and @Jnanaranjan_sahu — opened one more PR today:                                                              
                                                                                                                                     PR #64 — fixes a few reliability issues I spotted while testing locally:
- /api/upload and /api/edit_page were calling match[0][0] on the regex result without checking if it was empty, so any malformed     URL would crash with an IndexError instead of returning a proper 400                                                                 - download_image() was calling .replace() on content-type without checking if the header existed — AttributeError on certain       
responses                                                                                                                          
- Added timeout to every requests call (30s for API calls, 60s for image downloads, 120s for file upload POST) — without these, a    slow wiki endpoint can hang a worker thread indefinitely                                                                         
- Celery task had no error handling or logging around the CSRF fetch and upload POST — failures were silent. Wrapped both with       try/except and logger.error                                                                                                   
- Fixed inconsistent "error": [] vs "errors": [] across two API routes                                                                                                                                     
https://github.com/indictechcom/wikifile-transfer/pull/64                                                                          
                                                                                                                                     Also working on my proposal — planning to cover batch upload, upload history with retry, category localization, and test coverage.   Happy to discuss the design before I finalize anything.

Hi @ParasharSarthak and @Jnanaranjan_sahu,

I’ve opened a few PRs to start contributing:

->PR #60 — README Improvements
Improved setup and workflow instructions (general contribution enhancement)

->PR #52 — Temp File Cleanup
Ensured temporary files are always deleted using a finally block, preventing leftover files even on failure

-> PR #53 — Error Handling
Improved API response handling by safely parsing responses and returning structured errors instead of silent failures

#Observations

While working on the backend, I traced the full upload flow:
CSRF → upload → response parsing

I noticed that failures around external API calls can still lead to:

-Hanging requests

-Unclear or inconsistent error responses

#Next Steps

-Add timeouts + exception handling for external requests

-Improve logging in Celery tasks

-Start a basic structure for upload history (to support retry later)

I’ll continue in this direction unless there’s a preferred approach.

Thanks!

Hello,
I have explored the Wikifile-Transfer tool and tested the live application.
I’m currently reviewing the project and planning to contribute.
I would like to start with a small task related to this project.
Any suggestions on where to begin would be appreciated.
Thanks!

I would like to start with a small task related to this project.
Any suggestions on where to begin would be appreciated.

Hi, have you read the task description?

Hi @ParasharSarthak and @Jnanaranjan_sahu,

I’m Krishna, a computer engineering student from India, and I’m really interested in working on the Wikifile-Transfer project for GSoC 2026.

Over the past couple of days, I’ve been going through the project details and exploring the repository to understand how the upload flow works. I also tried out the live tool, which gave me a clearer idea of how everything fits together.

My background includes working with Python and Java, along with some experience in backend concepts and application development. I’ve built Android applications during my internship and recently started focusing more on backend development using Spring Boot. I’ve also explored areas like computer networks and basic web technologies, so I’m quite interested in understanding how systems like this work end-to-end.

This project stood out to me because of its real-world use and the backend challenges involved, which aligns well with what I want to learn and work on.

I’d like to start contributing with smaller tasks first so I can properly understand the codebase. Would you recommend beginning with the microtasks listed, or is there any specific area you think would be a good starting point?

Looking forward to contributing and learning from this project.

Thanks!

Hi,

I’ve started contributing to the project and submitted a pull request to improve backend API reliability by adding User-Agent headers to all requests.

PR link: https://github.com/indictechcom/wikifile-transfer/pull/66

I’m continuing to explore the codebase and would love feedback on this approach. Also happy to work on further improvements or other areas where help is needed.

Thanks!

Hi,
I’ve started contributing to the project and recently opened another PR where I worked on improving error handling by adding some basic logging. I also made sure the API requests consistently include the User-Agent header.

Here’s the PR: https://github.com/indictechcom/wikifile-transfer/pull/67

I’m still going through the codebase and would really appreciate any feedback. Happy to make changes or take up any other tasks if needed.

Thanks!

Hii , I am interested in working on this project for GSoC 2026. I have experience with Python/Flask, Redis, Javascript , React and MySQL. I would love to discuss the implementation details with the mentors.

@ParasharSarthak @Jnanaranjan_sahu could you please take a look at PR #61 for T415715 and T415717. It covers the backend error-handling improvements and temp-file cleanup work. I have also completed my GSoC proposal, do you want me to share it over here so that we can discuss accordingly

Hi @ParasharSarthak @Jnanaranjan_sahu

I recently came across this project when looking for organizations for GSoC and found out about Wikimedia, and really liked the mission of Wikimedia to make free knowledge accessible to everyone.

I’m very interested in working on the Wikifile-Transfer project for GSoC. My background includes Python, JavaScript/React, SQL, Redis, and Docker, and I believe these skills align well with the project requirements.

I’m currently setting up the project locally and exploring the codebase. I plan to start with the listed microtasks T415715 and T415717 and will submit PRs soon. I’ll keep you updated on my progress.

Hi,

I’ve opened another PR where I added timeouts to API requests to avoid potential hanging calls and improve reliability.

https://github.com/indictechcom/wikifile-transfer/pull/71

I’m continuing to go through the codebase and understand how different parts are connected. Would appreciate any feedback, and I’m happy to work on further improvements if needed.

Thanks!

Hi,

I’ve opened another PR to improve the Celery upload task by adding timeouts and consistent headers for API requests.

https://github.com/indictechcom/wikifile-transfer/pull/73

This is to ensure background tasks are more reliable and don’t hang during long-running uploads.

Would appreciate any feedback. I’m continuing to explore the codebase and look for further improvements.

Thanks!

Hi,

I’ve submitted another PR improving the Celery upload task by adding retry handling, timeouts, and better error handling to make the process more reliable.

PR: https://github.com/indictechcom/wikifile-transfer/pull/75

I’m continuing to explore the codebase and will keep working on further improvements.

Thanks!

Hi,

I’ve submitted a PR adding a new feature to track the status of background upload tasks using a task ID.

This allows users to monitor progress and retrieve results after initiating uploads asynchronously.

PR: https://github.com/indictechcom/wikifile-transfer/pull/76

I’d appreciate any feedback.

Thanks!

Hi

I’ve submitted three PRs for review:

PR1: https://github.com/indictechcom/wikifile-transfer/pull/82 Adds centralized logger integration to the app.
PR2: https://github.com/indictechcom/wikifile-transfer/pull/83 Implements improvements for error and exception handling (T415715).
PR3: https://github.com/indictechcom/wikifile-transfer/pull/84 Fixes temporary file cleanup and file-handle leaks (T415717).

I would greatly appreciate your feedback whenever you have a chance.
Thank you!
@ParasharSarthak @Jnanaranjan_sahu

@LGoto can you try contacting the mentors for clarifications as to how do I proceed I have submitted the proposal and PR for the minitasks but want to get it reviewed from the mentors, what shall I do?

@Anirudh23090 Please ask general GSoC process questions in Zulip instead. Thanks.

Hi,

I’ve submitted a PR improving the reliability of the upload process by enhancing error handling, adding request timeouts, and ensuring temporary files are cleaned up after processing.

This helps prevent hanging requests and improves debugging.

PR: https://github.com/indictechcom/wikifile-transfer/pull/85

I’d appreciate any feedback.

Thanks!

Hi,

I’ve submitted another PR improving the robustness of the upload API by adding input validation and better error handling.

This prevents crashes from invalid inputs and ensures safer execution of the upload process.

PR: https://github.com/indictechcom/wikifile-transfer/pull/86

I’d appreciate any feedback.

Thanks!


Hello @Jnanaranjan_sahu @ParasharSarthak,

As requested, I am sharing my GSoC 2026 proposal for the project “Enhancing Wikifile-Transfer with Batch Upload, Upload History, and Metadata Localization”.
I am genuinely interested in working on this project and have already contributed to the Wikifile-Transfer codebase. I would greatly appreciate your feedback and suggestions.

Thank you!

Hi, I'm Rashi Gupta, a 2nd year B.Tech Computer Science student at SRM Institute of Science and Technology, Delhi-NCR (CGPA: 9.79/10).
I am interested in contributing to the Wikifile-Transfer project for GSoC 2026. My technical background aligns well with this project:

Built Nexus Trade — a full-stack stock trading simulator using Flask, MySQL, React.js, and JWT authentication with REST APIs
Built Reputation Forge — a DApp using React.js, Node.js, Express.js with REST APIs
Experience with Python, Flask, React.js, MySQL, PostgreSQL, REST APIs, Git

I have explored the live tool at wikifile-transfer.toolforge.org and the source code at github.com/indictechcom/wikifile-transfer. I am currently preparing my proposal and would love guidance from mentors on the expected scope and any preferred approach for the batch upload and upload history features.
GitHub: https://github.com/Rashi1005
Looking forward to contributing!

Hi @ParasharSarthak and @Jnanaranjan_sahu,

I've been working on the Wikifile-Transfer project and wanted to share a summary of my contributions so far.

  1. Investigations :-

a. Issue #39 (Add Translated Languages)
Investigated why only 3 languages appeared on the production site despite 34 language files existing in the codebase. Traced the issue using git history and confirmed the production site is running a stale build from Version 2 that predates 31 language additions. Posted findings on the issue.

  1. Bug Fixes :-

a. Issue #46 (Missing User-Agent Headers) - https://github.com/indictechcom/wikifile-transfer/pull/69
All existing PRs (#47, #66) fixed utils.py but missed tasks.py, where the actual Celery upload task makes its own API calls without headers. My PR covers both files completely, also adding request timeouts, logging, and context managers.
b. Issue #78 (Language Column Too Short) - https://github.com/indictechcom/wikifile-transfer/pull/79
Discovered an unreported bug while investigating issue #39. The user_language and pref_language columns in model.py are String(4), which truncates multi-part language codes like zh-hans, pt-br, and sr-ec. Confirmed the bug on production (500 error when saving zh-hans). Fixed by increasing both columns to String(20) with a database migration.

  1. Improvements :-

a. Issue #48 (Mobile Header Responsiveness) - https://github.com/indictechcom/wikifile-transfer/pull/74
Tested PR #49 locally and found it removed the language selector and login button on mobile. My PR adds a hamburger menu that keeps all functionality accessible on mobile including navigation, language selector, and login/logout. Also fixed an api.js bug where the built app always pointed to the production URL regardless of environment.

  1. Temp File Cleanup T415717 - https://github.com/indictechcom/wikifile-transfer/pull/81

Reviewed all existing temp file cleanup PRs (#58, #62, #65) and identified gaps (absence of clean up of partial downloads when download_image() fails midway). Added a dedicated cleanup_temp_file() utility with logging, cleanup via finally blocks in both tasks.py and app.py, and proper handling of all failure scenarios.

Looking forward to any feedback on the PRs.

Thanks

@ParasharSarthak @Jnanaranjan_sahu

Hi, I’ve submitted my GSoC 2026 proposal for the Wikifile-Transfer enhancement project.

I’m very interested in contributing and have started exploring the project and understanding its current workflow. I would really appreciate any feedback or suggestions on how I can further improve or get involved.

Looking forward to your guidance.

Thank you!

Hi everyone,

Thank you all for your contributions and for completing the microtasks. Today is the last day for proposal submission.
The final deadline is March 31, 2026 11:30 PM (Asia/Calcutta timezone).

Please refer the link for exact timeline : https://summerofcode.withgoogle.com/programs/2026

I request everyone who has completed their contributions and microtasks to prepare and submit their proposal with all the required details before the deadline.

Please do not submit your proposals directly in this task thread. Instead, create a separate subtask for your proposal and attach all the required details there so it can be properly reviewed and discussed.

Also attach your PRs (GitHub pull requests) or any work you have created in your proposal.

Selection will be based on the quality of your PRs and the detailing of your proposal. It is recommended to document everything properly.

Also, please avoid using AI tools for creating proposals or PRs. If it is found that AI generated content is used, your application may get rejected.

Before submitting, please make sure to review everything carefully and check that all required details are included. If any changes are needed, please make them before submission, as no changes will be accepted after the deadline.

Please try to submit your application 30 minutes before the final deadline to avoid last minute rush.

All the best
Regards

This comment was removed by Xinacod.