1. Synopsis and Personal Profile
1.1. Profile
- Name: Wah Vanessa
- UserName: essa237
- Github: url
- Location: Douala, Cameroon
- Timezone: UTC+1 (WAT)
- Ideal Working Hours: 9:00 AM – 5:00 PM UTC+1 (40 hours per week)
1.2. Synopsis
The Lusophone Technological Wishlist is a community initiative designed to identify and prioritize the technical needs of the Portuguese-language Wikimedia editors. This project aligns with Wikimedia's annual plan Product & Engineering Support (PES), Wiki Experiences (WE), and ways of working to move faster, smarter, and better. Also, the plan focuses on reducing contributing friction and improving the reliability of structured data.
As a full-stack junior developer and a member of Wiki Mentor Africa and Wiki African Women communities, I am applying to address Wish #wishlist #3 (verify duplicate references automatically), Automatic Duplicate Reference Detection in VisualEditor.
Having authored articles such as Denis Worlanyo Aheto and Anikeade Funke-Treasure and done a few edits, I have expirienced first hand citation fatigue due to the current lack of integration between the automatic and Re-use citation workflows. A solution that uses identifier normalisation to bridge this gap with the system intelligence, prompting me to re-use the existing citation rather than creating a duplicate.
My background as a contributing developer to the WdTmCollab tool prepared me to tackle the complexities of MediaWiki and Wikibase. I have a goal for this internship to implement a solution with logic within the VisualEditor that identifies existing identifiers (ISBN, DOI, URL) in real time, preventing redundant data entry and keeping article sources clean and easily maintainable for the Losuphone community and Wikimedia as a whole.
1.3. Community engagement and Experience
I am not just a coder; I am a new practitioner within the ecosystem. My relevant experience includes:
- Editorial insight: Experience managing infoboxes, articles, and citation structures in English Wikipedia.
- Mentorship: Currently serving as a mentor in Wiki Mentor Africa, helping new editors navigate the very tools I now seek to improve.
2. Problem Statement
2.1. Current citation workflow
The visual editor citation tool, as seen in the image below, separates the automatic, manual, and re-use features into isolated tabs.
- How the automatic works: When an editor uses the automatic tab to paste its URL or DOI, the system generates a new reference. It does not check if the same already exists in the re-use list.
- This leads to data redundancy, "reference clutter," where the same source appears so many times in the citation list under different numbers, making heavy wikitext
2.2. Complexity of Manual Entries (Wish # 3)
As seen in the 2nd, 3rd, and 4th images, the manual citation process involves a high cognitive load. It has 5 different categories (Website, Book, News, Journal, and Basic) and so many fields (ARCHIVE, URL, DOI, ISBN, etc.) here; the risk of duplication is higher.
What is wrong? An editor might manually fill out a citation web template for a source already automatically cited via the automatic tool.
What should be done? The system should be able to normalise inputs and automatic entries.
2.3. The Wikidata scoring gap (wish #4)
Recently, WikiScore use for Lusophone edit-a-thons is limited to Wikipedia edits; structured data contributions such as statements/claims on Wikidata go uncounted. This does not encourage contributors with WikiBase-related tasks.
2.4. Technical Challenges to Overcome
For this wish to be realized, our implementation has to address a few technical huddles:
- Normalization:
- A DOI can be written as 10.1100/abc or https://doi.org/10.1100/abc.
- A URL may have a www. prefix or not.
My approach has to involve a normalization engine ensuring the right sources are compared with the right ones. This has to be real-time; we have to trigger the check the moment the user clicks "create" in the automatic tab or "insert" in the manual tab without lagging the UI.
In case there is any duplicate, the user should not be blocked; rather, the user should be guided to the re-use flow automatically, a proper UI/UX feedback.
3. Technical Approach.
3.1. Wish #3 VisualEditor/Ctoid Integration
While going through two repos, I found them to be a source of truth for me to understand the technicalities and how the architecture of the citation feature came about. I realized that, while the user interacts with the VisualEditor (VE), the one with a large codebase, the logic for automation, if I understood very well, is in the Citoid extension. Implementation has to target the modules/ve directory within the codebase. This may ensure existing communication between the citoid and the VE.
- From findings and evaluation, the primary point for this is `ve.ui.CitoidInspector.prototype.performLookup` in the ve.ui.CitoidInspector.js this method follows the path: decoding input and triggering an API call.
I propose to interrupt this flow with a local first check
Decoding: Capturing the search string.
Before initialization, I will implement a check against the document's InternalList. It will iterate through existing mwReference nodes. I will specifically parse the templateData of these nodes to see if the URL, DOI, or ISBN matches the new input. I can identify matches. before server requests are made.
- To prevent false negatives, such as failing to match a URL because of a missing slash, I will build a normalization utility for URLs: Use mw. Uri to strip protocols and fragments for a "canonical" comparison, DOIs: Convert to a uniform lowercase format, stripping the https://doi.org/ prefix. ISBNs: Use regex to strip hyphens and spaces, comparing only the raw numeric string.
- Make use of OOUI (OpenIcons User Interface) framework
- From the patterns of it, I will add a new OO.ui.MessageWidget. If there is any duplicate found, the UI will provide the action button. Clicking this will trigger the method, automatically switching the user from the "Auto" tab to the "Reuse" tab with the matching reference highlighted.
3.2. Wish #8: Wikiscore/Wikidata Support.
It will be good to extend the Wikiscore Python to support the Wikibase API. Implementing a module for user contributions fetching Wikidata-specific actions such as labels, descriptions, and statements.
Also, Wikidata contribution points should be integrated into the existing wikiscore dashboard to provide a unified view of the contest progress. Solving this involves moving from just counting edits to identifying meaningful contributions. like adding a specific property or reference on Wikidata.
4. Internship Timeline
Outreachy requires 40 hours per week , bi-weekly blog post and clear milestones. I am commiting my full attention to this project. 40 hours per week (9:00AM - 5:00PM WAT/UTC+1) but I can work as well with my mentors time zone. This timeline is a living document and will be refined with mentors during the bonding periode with weekly report every friday
| Period | Technical Tasks & Milestones | Outreachy & Community Tasks |
| Week 1 (May 18) | Setup & Trace: This week will be dedicated to a proper onboarding session and getting a proper understanding of the project, and with mentors' insight and deep research, I will document more findings, ask questions for more clarity and best practices, and move on to setting up so that implementation can begin without interruption. For example, if the project will be built on an existing repo for VE or Citoid, I will.Configure MediaWiki-Docker. Trace performLookup in ve.ui.CitoidInspector.js and identify the exact insertion point for the local check. | Kick-off meeting with mentors to align on communication and project expectations. Blog Post 1: Introduction and Project Goals. |
| Week 2 (May 25) | Normalization Engine: Develop the JS utility to "clean" URLs, DOIs, and ISBNs (Regex for numeric strings, mw.Uri for canonical URLs). | Code Review Standards: Align with mentors on specific Gerrit/GitHub review workflows for this repository. Submission of Weekly Report 1. |
| Week 3 (Jun 1) | InternalList Interceptor: Implement the logic to scan InternalListfor existing identifiers before initialisation | Blog Post 2: Navigating the VisualEditor Codebase. Weekly Report 2. |
| Week 4 (Jun 8) | OOUI Alert System: Build the OO.ui.MessageWidget. Implement the "Duplicate Found" state. | UI/UX review session with mentors to ensure "Native" feel. Weekly Report 3. |
| Week 5 (Jun 15) | The "Switch-to-Reuse" Action: Link the alert button, ensuring the matching reference is highlighted in the Reuse tab. | Blog Post 3: Building for Real Users. Weekly Report 4. |
| Week 6 (Jun 22) | Performance & Stress Testing: Test the interceptor on long articles (100+ refs). Optimize the loop through mwReference nodes. | Bug hunting and edge-case handling (e.g., cross-type matching). Weekly Report 5. |
| Week 7 (Jun 29) | Mid-point Documentation: Write technical documentation on MediaWiki.org regarding the new citation interception logic. | Mid-point Progress Blog Post. Mid-internship feedback session. |
| Week 8 (Jul 6) | Wish #3 Finalization: Address all feedback from mentors. Submit the finalized patch to Gerrit for the VisualEditor/Citoid duplicate checker. | Blog Post 4: What Open Source Taught Me. Weekly Report 7. |
| Week 9 (Jul 13) | WikiScore Deep Dive (Wish #8): Shift focus to Wikiscore (Python/Django). Trace current scoring logic for Wikipedia edits. | Technical Plan: Draft the Wikidata API integration plan for scoring. Weekly Report 8. |
| Week 10 (Jul 20) | Wikidata API Integration: Build the Python module to query wbgetentities and track user contributions (Claims/Statements). | Blog Post 5: Moving from JavaScript to Python/Django. Weekly Report 9. |
| Week 11 (Jul 27) | Scoring Logic & UI: Implement the point-assignment logic for Wikidata actions. Integrate these scores into the Wikiscore dashboard. | Full-system integration testing with sample edit-a-thon data. Weekly Report 10. |
| Week 12 (Aug 3) | Final Optimization: Fix any bugs in the WikiScore Wikidata module. Perform final performance checks on the Citoid checker. | Final code review and cleanup. Blog Post 6: Building for the Lusophone Community. |
| Week 13 (Aug 10) | Handover & Deployment: Prepare final handover notes. Ensure all documentation is accurate and public. | Final Progress Blog Post. Submit final internship report. |
After this timeline what next?
My work and contribution in the wikimedia ecosystem is not ending anytime soon as a woman a member of Africa Wiki Women and a practictioner with wiki mentor africa, my journey with the wikimedia ecosystem will not end in August, I intend to:
- I will monitor the Citoid duplicate checker and Wikiscore module for any breakages caused by upstream MediaWiki updates.
- I will provide support to Lusophone organizers using Wikiscore and help translate the documentation into French to support broader African communities.
- Having navigated the challenges of a junior developer, I plan to mentor future Outreachy applicants and continue contributing by writing articles the WdTmCollab tool and other Wikibase projects.



