Project: Wikimedia Foundation – MediaWiki / Wikiscore
Mentors: @Ederporto, @Arcstur
Rating: Medium
Microtasks: T418285 (Wish #3) · T418286 (Wish #8)
1. About Me
| Field | Details |
|---|---|
| Name | Ejibode Ibraheem |
| GitHub | https://github.com/Linsmed |
| Location / Timezone | Lagos, Nigeria — UTC +01:00 |
| Availability | 40 hours/week, May 18 – August 17, 2026 |
| Contact | iejibode@gmail.com |
I am a software engineer with a strong interest in DevOps, backend systems, and open-source contribution. I have experience working with Python, JavaScript, and cloud technologies, and I have built and deployed full-stack applications as well as automated systems.
During this Outreachy contribution period, I completed the required microtasks and improved my work based on mentor feedback. This process helped me better understand real-world challenges such as handling edge cases (e.g., date inconsistencies across time zones), improving error handling, and writing more robust and maintainable code.
Beyond this, I have also previously contributed to the Debian openQA project, where I worked on automating operating system testing workflows inside virtual machines. This experience strengthened my ability to work with large codebases, follow structured contribution workflows, and iterate based on feedback.
2. Abstract
This proposal focuses on Wishlist #3: implementing duplicate reference detection in the Visual Editor, with a secondary interest in Wishlist #8: adding Wikidata support to WikiScore.
Duplicate references are a common issue in Wikipedia editing workflows. Editors, especially newcomers, may unknowingly add the same reference multiple times due to the lack of visibility into existing citations. This leads to redundancy, inconsistency, and reduced readability.
This project proposes a solution that:
- Detects duplicate references using identifiers such as URL, DOI, and ISBN
- Notifies users in real time within the Visual Editor
- Provides an option to reuse existing references instead of duplicating them
Additionally, if time permits, I will explore initial contributions toward enabling Wikidata scoring in WikiScore to better recognize structured data contributions.
3. Background and Motivation
3.1 Wish #3 — Duplicate Reference Detection and Improving Reference Reuse
One of the major challenges in Wikipedia editing is managing references efficiently. In very long articles, it becomes extremely difficult to know whether a source has already been cited.
This often results in:
- Duplicate references
- Increased clutter in article source
- Inefficient editing workflows
From my work during the contribution phase, I gained experience handling URLs and processing structured input data. I observed how small variations in input (e.g., URL formats) can lead to inconsistencies, which directly relates to the problem of duplicate references.
Addressing this issue will significantly improve the editing experience, particularly for new contributors.
3.2 Wish #8 — Recognizing Wikidata Contributions
Wikidata is an essential part of the Wikimedia ecosystem, but contributions to it are not fully represented in tools like WikiScore.
Enabling Wikidata scoring would definitely:
- Encourage more participation in structured data contributions
- Provide a more complete view of contributor activity
- Support organizers of edit-a-thons and contests
4. Technical Approach
4.1 Wish #3 — Duplicate Reference Detection in Visual Editor
The implementation will focus on integrating duplicate detection into the Visual Editor workflow.
Core Steps
- Identifier Extraction
- Extract URL, DOI, or ISBN from user input
- Normalization
- URLs → remove trailing slashes, normalize scheme
- DOI → convert to lowercase
- ISBN → remove hyphens and standardize format
- Retrieve Existing References
- Use MediaWiki API (action=parse) to fetch references already present in the article
- Comparison
- Compare normalized identifiers against existing references
- User Feedback
- Display a notification when a duplicate is detected
- Provide a “reuse reference” option
User-facing surfaces
- Non-intrusive inline notification
- Clear reuse option
- No disruption to editing flow
Key Considerations
- Handling variations in identifier formats
- Avoiding false positives
- Ensuring performance for large articles
To address these:
- I will implement normalization carefully
- Use efficient lookup structures
- Test across different real-world scenarios
4.2 Wish #8 — Wikidata Support (Exploratory)
If time permits, I will:
- Explore WikiScore codebase
- Fetch Wikidata contributions via MediaWiki API
- Investigate integration into scoring logic
This will be approached incrementally after progress on Wishlist #3.
5. Timeline
Primary Focus: Wishlist #3 — Duplicate Reference Detection
Internship Period: May 18, 2026 – August 17, 2026 (13 weeks)
Strategy: The timeline is structured to prioritize a complete and high-quality implementation of Wishlist #3, while allowing flexibility for iterative feedback and potential exploration of Wishlist #8.
Week 1- 2 — Onboarding, Environment Setup & Codebase Understanding
Goal: Build a strong understanding of how references are currently handled in MediaWiki and the Visual Editor.
- Set up local development environment for MediaWiki and the Cite extension
- Study how the Visual Editor manages references (especially reference dialogs and insertion flow)
- Explore how references are stored and rendered in articles (<ref> tags and named references)
- Review existing code structure to identify where duplicate detection logic can be integrated
- Discuss and validate approach with mentors @Ederporto and @Arcstur
Outcome:
Clear understanding of system architecture and a well-defined implementation plan.
Week 3-4
Wish #3: Identifier Extraction & Normalization —
Goal: Build a reliable way to extract and standardize reference identifiers.
- Implement logic to extract identifiers (URL, DOI, ISBN) from user input
- Develop normalization functions:
- URLs → remove trailing slashes, normalize scheme
- DOI → lowercase and clean formatting. -ISBN → remove hyphens and standardize format
- Handle edge cases such as incomplete or malformed inputs
- Write unit tests to validate normalization across different formats
Outcome: - Reliable identifier processing system that ensures consistent comparison.
Week 5-6
Wish #3: Reference Retrieval & Duplicate Detection Logic
Goal:Compare new references with existing ones in the article.
- Use MediaWiki API (action=parse) to retrieve existing references in an article
- Build a lookup structure (e.g., dictionary/map) of normalized identifiers
- Implement duplicate detection by comparing input identifiers with stored references
- Ensure efficient lookup to avoid performance issues in large articles
- Test with different article sizes and reference types
Outcome: - Working backend logic that accurately detects duplicate references. ---
Week 7-8
Wish #3:Integration with Visual Editor & User Feedback
Goal:Make the feature usable within the editing interface.
- Integrate duplicate detection into the Visual Editor workflow
- Display real-time notifications when a duplicate is detected
- Implement a “reuse existing reference” option for users
- Ensure UI is non-intrusive and does not disrupt editing flow
- Test interaction flow with different user scenarios
Outcome: - Fully functional feature integrated into the editor with user-friendly feedback.
Week 9-10
Wish #3: Edge Case Handling, Optimization & Testing
Goal: Improve reliability and performance.
- Handle edge cases:
- Slight variations in URLs -Missing or partial identifiers -Articles with large numbers of references
- Optimize performance:
- Efficient data structures
- Avoid repeated API calls where possible
- Conduct extensive testing across multiple scenarios
- Fix bugs and refine logic based on results
Outcome:
- Stable and efficient implementation ready for review.
Week 11-12 Documentation, Feedback Integration & Final Submission
Goal:Prepare for production-quality contribution.
- Write clear developer documentation explaining the implementation
- Add inline comments to improve code readability
- Incorporate mentor and reviewer feedback
- Perform final testing and refinements
- Submit final patch for review
Outcome: - Complete, well-documented feature ready for integration.
Secondary Plan: Wishlist #8 (If Time Permits)
If progress on Wishlist #3 is completed ahead of schedule, I will begin exploratory work on Wishlist #8:
- Study the WikiScore codebase and architecture
- Explore Wikimedia APIs for fetching Wikidata contributions
- Prototype a simple contribution-fetching mechanism
- Investigate how scoring logic can be extended
Outcome
- Initial groundwork for Wikidata integration in WikiScore.
6.Timeline Summary
| Week | Focus |
|---|---|
| 1 -2 | Onboarding, Environment Setup & Codebase Understanding |
| 3-4 | Identifier Extraction & Normalization |
| 5-6 | Reference Retrieval & Duplicate Detection Logic |
| 7-8 | Integration with Visual Editor & User Feedback |
| 9-10 | Edge Case Handling, Optimization & Testing |
| 11-12 | Documentation, Feedback Integration & Final Submission |
7. Expected Deliverables
- Duplicate reference detection feature
- Identifier normalization logic
- Visual Editor integration with user feedback
- Test cases and documentation
- Initial exploration or prototype for Wikidata integration (if feasible)
8. Skills and Qualifications
- Python and JavaScript
- API integration and data handling
- Experience with automation and system reliability (DevOps background)
- Open-source collaboration (Debian openQA contribution)
9. Community Engagement
- Completed both microtasks T418285 and T418286
- Incorporated feedback
- Wrote a technical article on the Lusophone Wishlist
- Will maintain consistent communication with mentors
- Open to feedback and iterative improvement
- participate in WMF community tech forums and respond .
10. Post-Internship Commitment
I plan to continue contributing to Wikimedia projects after the internship by maintaining and improving implemented features and supporting future contributors where possible.
11. Why This Project
I am particularly drawn to this project because it focuses on improving contributor experience in a practical and meaningful way. During the contribution period, I worked with URL processing and saw how small inconsistencies in data can lead to duplication. This made Wishlist #3 especially relatable, as duplicate references, though seemingly minor, can affect the quality and maintainability of Wikipedia articles.
I am motivated by the opportunity to build a solution that reduces friction for contributors, especially newcomers, by making it easier to reuse existing references instead of creating duplicates. I find this kind of improvement impactful because it enhances both usability and content quality without disrupting existing workflows.
I am also interested in the broader Wikimedia ecosystem, particularly how tools like WikiScore can better recognize different forms of contribution. If time permits, I would be excited to explore this further through Wishlist #8.
Overall, this project stands out to me because it combines technical problem-solving with meaningful impact on a global knowledge platform