Profile Information
Name: Anushka Dasgupta
GitHub: https://github.com/anushkadasgupta
Location: India
Timezone: UTC+5:30 (IST)
Availability: 30–40 hours/week
I am an Electrical Engineering undergraduate and a frontend-focused developer, experienced in building interactive web applications using JavaScript. I enjoy working on user interfaces, handling dynamic data, and creating smooth user experiences.
During the Outreachy contribution period, I completed microtasks (T418285, T418286), where I worked on handling real-world data and request processing. This helped me understand the Wikimedia ecosystem and how its development workflow operates.
Synopsis
The Lusophone Technological Wishlist aims to improve the experience of contributors across Wikimedia projects by addressing real community needs.
This proposal focuses on:
- Wishlist #3: Detecting duplicate references in the Visual Editor
- Wishlist #8: Supporting Wikidata-based scoring for edit-a-thons and contests
Both features aim to improve contributor experience by reducing manual effort and improving efficiency.
Problem Statement
In the Visual Editor, contributors often add references without knowing whether the same source already exists. This leads to duplicate references, which reduce readability and make articles harder to maintain.
A key challenge is that the same reference can appear in different formats:
- URLs may differ in protocol (http/https), “www”, or trailing slashes
- DOIs may have different formats
- ISBNs may include or exclude hyphens
Because of these variations, simple comparison is not enough.
For wishlist #8, scoring contributions in Wikidata-based contests is often done manually, which is time-consuming and inefficient. Automating this process can help both organizers and participants track contributions more effectively.
Technical Approach
The solution will focus on normalization, efficient matching, and smooth integration.
1. Normalization
References will be converted into a consistent format:
- URLs → remove protocol, “www”, trailing slashes
- DOIs → extract and standardize identifier
- ISBNs → remove formatting differences
This ensures similar references can be compared reliably.
- 2. Duplicate Detection
- Store normalized references in a structured format
- Compare new references with existing ones
- Detect duplicates efficiently and accurately
- 3. Wikidata Scoring (Wishlist #8)
- Process contribution data from Wikidata
- Build logic to calculate scores automatically
- Ensure results are accurate and scalable for contests
- 4. User Experience
- Provide simple and non-intrusive feedback
- Suggest reuse of existing references
- Keep interactions smooth within the editor
Prototype / Prior Work
I have worked on JavaScript-based projects involving dynamic data handling, form processing, and DOM manipulation. These experiences helped me understand how to manage user input, process data efficiently, and build responsive interfaces.
I have also explored Python through small scripts for file processing and validation, which helped me understand structured logic and error handling.
Timeline
The internship runs for 12 weeks, and I will follow a structured plan:
Week 1
I will focus on gaining a solid understanding of the project requirements, exploring the codebase, and familiarizing myself with relevant technologies such as MediaWiki APIs, Wikidata APIs, and JavaScript/Python workflows within the Wikimedia Foundation ecosystem.
Weeks 2–4
I will begin implementing the core logic for wishlist #3 by developing functions to identify and compare references using identifiers like ISBN, DOI, and URLs, ensuring a reliable method for detecting duplicates.
Weeks 5–7
I will enhance this functionality by integrating API-based data validation, improving accuracy, and refining the user interaction for reusing existing references. In parallel, I will start working on wishlist #8 by understanding how contribution data is structured and building initial logic to calculate and track scores for Wikidata edits.
Weeks 8–10
I will expand the scoring system to handle multiple contribution types, improve performance, and ensure real-time or near real-time updates, while also optimizing the duplicate reference feature to handle edge cases effectively.
Weeks 11–12
In the final phase, I will focus on thorough testing, debugging, performance improvements, and writing clear documentation for both features.
Throughout the internship, I will maintain consistent communication with mentors, actively incorporate feedback, and adapt my approach to ensure steady progress and meaningful contributions aligned with the project goals.
Why I am a Strong Fit
I focus on building practical and user-friendly solutions. My experience with JavaScript and frontend development helps me understand how features should behave from a user’s perspective, especially in interactive environments like the Visual Editor.
I am comfortable working with dynamic data, handling edge cases, and improving user experience. I am also eager to learn and adapt, especially when working with APIs and new technologies.
I am confident in my ability to contribute effectively, learn quickly, and deliver a well-integrated solution.