Name: Supreet Kaur
GitHub: https://github.com/Supreetkaur1
Location: Punjab, India (IST – UTC+5:30)
Timezone: UTC+5:30
I am a backend-focused software engineer with professional experience building scalable and production-grade systems. I have worked on backend services involving API design, structured data processing, and system reliability at scale. My work emphasizes correctness, maintainability, and performance under real-world constraints.
During my software engineering experience at Amazon (Prime Video), I worked on backend systems dealing with large volumes of structured data and cross-service workflows. This included handling data consistency issues, debugging production-level pipelines, and ensuring reliable user-facing outputs.
Synopsis
The Lusophone Technological Wishlist is a community-driven initiative aimed at improving Wikimedia editing workflows for Portuguese-language contributors by prioritizing impactful technical enhancements.
For this internship, I propose to work on two wishlist items:
Wishlist #3: Automatic Duplicate Reference Detection in VisualEditor
Wishlist #8: Wikidata Integration for WikiScore
These two features together improve both the editing experience (VisualEditor) and contribution inclusivity (WikiScore) by reducing redundancy in citations and expanding scoring support to structured data contributions.
My goal is to design robust, performance-aware solutions that integrate cleanly into existing Wikimedia systems while improving usability for contributors.
Selected Wishlist
Wishlist #3: Automatic Duplicate Reference Detection in VisualEditor
Editors often unintentionally add duplicate references when citing sources using URLs, DOIs, or ISBNs. Since VisualEditor does not actively compare new references against existing ones in real time, duplicate citations are created, leading to cluttered and less maintainable articles.
The goal is to detect duplicate references during citation insertion and guide users toward reusing existing references instead of creating new ones.
Wishlist #8: Wikidata Integration for WikiScore
WikiScore currently focuses primarily on Wikipedia edits and does not fully account for Wikidata contributions. This limits its usefulness in edit-a-thons where structured data contributions are important.
The goal is to extend WikiScore to:
Fetch Wikidata contributions via APIs
Normalize and process structured data
Integrate Wikidata edits into scoring logic alongside Wikipedia edits
This will make WikiScore more inclusive and accurate for modern Wikimedia contribution workflows.
Technical Approach
1. Wishlist #3 – VisualEditor Duplicate Reference Detection
Understanding the System
While exploring VisualEditor’s architecture, I studied how citation workflows are handled, particularly how tools like Citoid generate references and how existing references are stored in the document model. This helped identify where duplicate detection logic can be introduced without disrupting the existing flow.
Implementation Approach
1. Reference Extraction
Extract existing references from the VisualEditor document model
Identify structured identifiers:
URL
DOI
ISBN
2. Normalization Layer
To ensure accurate comparison across formats:
URLs → normalize protocol, remove trailing slashes, standardize domain format
DOIs → strip prefixes (https://doi.org
), normalize casing
ISBNs → remove hyphens and whitespace
This ensures consistent identifier comparison.
3. Duplicate Detection Engine
Maintain an in-memory indexed structure of normalized identifiers
Perform O(1)-style lookup using hash-based comparison
Compare new citation input against existing references in real time
4. Integration into Citation Workflow
Hook into VisualEditor citation insertion flow
Run duplicate detection before final reference creation
Avoid blocking user actions; instead trigger suggestions
5. User Experience Handling
Display non-intrusive notification when duplicate is detected
Highlight existing reference in the reuse panel
Provide a “reuse existing reference” action
Key Challenges
Handling inconsistent or partial identifiers
Avoiding performance overhead in large articles
Ensuring non-disruptive UI behavior inside VisualEditor
2. Wishlist #8 – Wikidata Integration for WikiScore
System Understanding
I analyzed WikiScore’s existing architecture to understand how Wikipedia contributions are fetched and processed. The goal is to extend this pipeline to support Wikidata as an additional structured data source.
Implementation Approach
1. Data Fetching Layer
Use MediaWiki Action API and Wikidata endpoints
Fetch user contributions from Wikidata
Handle pagination, rate limiting, and API reliability
2. Data Processing Pipeline
Normalize contribution data into a unified schema
Filter relevant actions:
Item creation
Statement additions
Label/description edits
Reference updates
3. Scoring System Integration
Extend existing WikiScore scoring logic
Assign weighted scores for different Wikidata actions
Ensure consistency with Wikipedia contribution scoring
4. Performance Optimization
Batch API requests to reduce load
Introduce caching for repeated queries
Optimize processing for large edit-a-thons
Key Challenges
Differences in structure between Wikipedia and Wikidata edits
Handling API limitations and rate constraints
Ensuring consistent scoring across platforms
Timeline
The internship runs from May 18, 2026 to August 17, 2026 (13 weeks). I will work full-time (40 hours/week) and adjust execution based on mentor feedback.
Weeks 1–8: VisualEditor (Wishlist #3)
Week 1–2: Setup & Codebase Understanding
Set up MediaWiki + VisualEditor environment
Study citation workflow and reference model
Identify integration points for duplicate detection
Week 3–4: Core Implementation
Build normalization utilities (URL, DOI, ISBN)
Implement duplicate detection engine
Week 5–6: Integration
Integrate detection into citation insertion flow
Add reuse suggestion UI
Week 7: Performance & Edge Cases
Optimize for large articles
Handle inconsistent metadata cases
Week 8: Finalization
Testing, bug fixes, documentation, and patch submission
Weeks 9–13: WikiScore (Wishlist #8)
Week 9–10: API Integration
Implement Wikidata data fetching layer
Normalize contribution data
Week 11: Scoring Logic
Extend WikiScore scoring system
Integrate Wikidata contributions
Week 12: Testing & Optimization
Test with real edit-a-thon datasets
Optimize performance and reliability
Week 13: Finalization
Documentation, cleanup, and final submission
Why I Am a Good Fit
My background as a backend engineer at Amazon has given me experience in building and maintaining production systems that handle structured data, API integrations, and performance-critical workflows.
This project aligns directly with my experience in:
Designing scalable backend systems
Handling structured data pipelines
Debugging and improving system reliability
Working with API-driven architectures
Additionally, I am comfortable working across both backend and integration layers, which is important for contributing to both VisualEditor and WikiScore.
I am particularly motivated by systems that improve collaboration and data quality at scale, which aligns strongly with Wikimedia’s mission.
Post-Internship Contribution
After the internship, I plan to:
Continue maintaining and improving implemented features
Support new contributors in understanding the codebase
Contribute further to Wikimedia tools focused on structured data and editor experience
Stay active in the Wikimedia technical community