Page MenuHomePhabricator

Outreachy Internship Proposal- Addressing the Lusophone Technological Wishlist Proposals
Closed, DeclinedPublic

Description

Profile Information

Name: Anushka Dasgupta
GitHub: https://github.com/anushkadasgupta
Location: India
Timezone: UTC+5:30 (IST)
Availability: 30–40 hours/week

I am an Electrical Engineering undergraduate and a frontend-focused developer, experienced in building interactive web applications using JavaScript. I enjoy working on user interfaces, handling dynamic data, and creating smooth user experiences.

During the Outreachy contribution period, I completed microtasks (T418285, T418286), where I worked on handling real-world data and request processing. This helped me understand the Wikimedia ecosystem and how its development workflow operates.


Synopsis

The Lusophone Technological Wishlist aims to improve the experience of contributors across Wikimedia projects by addressing real community needs.

This proposal focuses on:

  • Wishlist #3: Detecting duplicate references in the Visual Editor
  • Wishlist #8: Supporting Wikidata-based scoring for edit-a-thons and contests

Both features aim to improve contributor experience by reducing manual effort and improving efficiency.


Problem Statement

In the Visual Editor, contributors often add references without knowing whether the same source already exists. This leads to duplicate references, which reduce readability and make articles harder to maintain.

A key challenge is that the same reference can appear in different formats:

  • URLs may differ in protocol (http/https), “www”, or trailing slashes
  • DOIs may have different formats
  • ISBNs may include or exclude hyphens

Because of these variations, simple comparison is not enough.

For wishlist #8, scoring contributions in Wikidata-based contests is often done manually, which is time-consuming and inefficient. Automating this process can help both organizers and participants track contributions more effectively.


Technical Approach

The solution will focus on normalization, efficient matching, and smooth integration.

1. Normalization

References will be converted into a consistent format:

  • URLs → remove protocol, “www”, trailing slashes
  • DOIs → extract and standardize identifier
  • ISBNs → remove formatting differences

This ensures similar references can be compared reliably.

    1. 2. Duplicate Detection
  • Store normalized references in a structured format
  • Compare new references with existing ones
  • Detect duplicates efficiently and accurately
    1. 3. Wikidata Scoring (Wishlist #8)
  • Process contribution data from Wikidata
  • Build logic to calculate scores automatically
  • Ensure results are accurate and scalable for contests
    1. 4. User Experience
  • Provide simple and non-intrusive feedback
  • Suggest reuse of existing references
  • Keep interactions smooth within the editor

Prototype / Prior Work

I have worked on JavaScript-based projects involving dynamic data handling, form processing, and DOM manipulation. These experiences helped me understand how to manage user input, process data efficiently, and build responsive interfaces.

I have also explored Python through small scripts for file processing and validation, which helped me understand structured logic and error handling.


Timeline

The internship runs for 12 weeks, and I will follow a structured plan:

Week 1

I will focus on gaining a solid understanding of the project requirements, exploring the codebase, and familiarizing myself with relevant technologies such as MediaWiki APIs, Wikidata APIs, and JavaScript/Python workflows within the Wikimedia Foundation ecosystem.

Weeks 2–4

I will begin implementing the core logic for wishlist #3 by developing functions to identify and compare references using identifiers like ISBN, DOI, and URLs, ensuring a reliable method for detecting duplicates.

Weeks 5–7

I will enhance this functionality by integrating API-based data validation, improving accuracy, and refining the user interaction for reusing existing references. In parallel, I will start working on wishlist #8 by understanding how contribution data is structured and building initial logic to calculate and track scores for Wikidata edits.

Weeks 8–10

I will expand the scoring system to handle multiple contribution types, improve performance, and ensure real-time or near real-time updates, while also optimizing the duplicate reference feature to handle edge cases effectively.

Weeks 11–12

In the final phase, I will focus on thorough testing, debugging, performance improvements, and writing clear documentation for both features.

Throughout the internship, I will maintain consistent communication with mentors, actively incorporate feedback, and adapt my approach to ensure steady progress and meaningful contributions aligned with the project goals.


Why I am a Strong Fit

I focus on building practical and user-friendly solutions. My experience with JavaScript and frontend development helps me understand how features should behave from a user’s perspective, especially in interactive environments like the Visual Editor.

I am comfortable working with dynamic data, handling edge cases, and improving user experience. I am also eager to learn and adapt, especially when working with APIs and new technologies.

I am confident in my ability to contribute effectively, learn quickly, and deliver a well-integrated solution.

Event Timeline

URLs → remove protocol, “www”, trailing slashes

This seems like a bad idea, because a different URL on a different protocol may well be a different site. (The http/https versions rarely differ, but sites where the www/plain versions are different are quite common. And the trailing slash is genuinely a different path.)

Gopavasanth subscribed.

Thank you for your proposal and the effort you put into it. This year we received over 20 strong applications, and after a highly competitive review, we were unfortunately unable to offer you a slot.

Please don't see this as a failure, many contributors who weren't selected for Outreachy have gone on to make meaningful, lasting impact in the Wikimedia community, and we genuinely hope you'll stay engaged. You're very welcome to continue contributing outside of Outreachy. Our mentors and org admins are happy to help you get started or keep going:

We hope to see you around in the community.