1. Profile
- Name : Shehrbano Ali
- Email : shehrbanoali2230@gmail.com
- Github : Shehrbaano-Ali
- Location: Pakistan
- Timezone: PKT (UTC+5)
- Typical working hours: 1:00 PM – 1:00 AM (UTC+5)
- Contribution Report: Shehrbano Ali - Contribution Report
2. Project Background
The Lusophone Technological Wishlist is a community survey designed to identify the most basic needs of editors, readers, and researchers in the Portuguese-speaking community. The goal is to make their experience more productive and pleasant.
This project directly aligns with the Wiki Experiences category of the WMF Annual Plan. I will implement two major proposals from the 2025 wishlist:
>Wish 3: Automatic Duplicate Reference Detection in the Visual Editor:
1. Currently, It is hard to find existing references in long articles, so people often add the same links twice and make the page messy.
2. This wish demands a way to automatically catch duplicate references, such as matching URLs, DOIs, or ISBNs, so that Wikipedia articles stay clean, organized, and easy for everyone to maintain.
3. I will do this by building a real-time Duplicate Detector in the Visual Editor that suggests existing sources as you type and provides a simple way to reutilize or merge citations with just one click.
>Wish 8: Wikidata Scoring Support for WikiScore:
1. Many communities hold contests for Wikidata, but because there is no automated way to score them, organizers must count edits by hand, which is slow and leads to mistakes.
2. This wish demands an automated system to track and score Wikidata contributions so that contests are accurate, fair, and inclusive for everyone.
3. I will do this by building Wikidata support into WikiScore to automatically fetch edit data and display it in a simple dashboard for organizers and participants.
3. Mentors
4. Project Contribution Details (Microtasks)
~ Microtask 1: T418285
- Github Repository: View Code
- Technical README: Read Documentation
- Analysis Blog: Beyond the Formatter: Data Integrity
- Live Prototype: Interactive JavaScript Logic
~ Microtask 2: T418286
- GitHub Repository: View Code
- Technical README: Read Documentation
- Analysis Blog: Verification to Quality Engineering
- Live Prototype: Python Audit Terminal
5. My Introduction & Contribution Report
I have compiled my full introduction, technical background, and contribution report into a professional GitHub Pages site for easier review.
- Interactive Report Site: Shehrbano Ali - Contribution Report
- Report Source Repository: GitHub Link
6. Past Projects & Experience
> Medical Report Analyzer (Solo)
1. My tech stack for this project is: Python, NLP, Data Engineering.
2. In this project I used NLP to transform unstructured medical reports into organized, structured data for doctors.
3. From this project I learned how to process messy text and write reliable code that handles data errors without crashing.
4. I applied this logic in my microtasks for interpreting messy text that directly supported Task 1 (JSON structuring) and Task 2 (cleaning inconsistent CSV data).
5. Link: Medical Report Analyzer
> Plant Disease Detector (Solo)
1. My tech stack for this project is: Python, Deep Learning (CNN, MobileNetV2).
2. In this project I developed an image classification tool using Transfer Learning to help farmers identify plant diseases instantly via mobile photos.
3. The struggle I faced while building this model was initially my base CNN model that wasn't learning meaningful patterns. Instead of getting stuck, I researched and transferred to MobileNetV2 (a pre-trained expert model), which significantly increased accuracy to 91%.
4. From this project I learned how to find better technical solutions through research when things are not working and catch the small details in complex code.
5. From this experience:
i. In Microtask 1, my attention to detail helped me catch a timezone error, which the mentor noted as a strong technical observation.
ii. In Microtask 2, I quickly integrated my mentor's feedback on header skipping and exception naming to make the script better.
7. Link: Plant Disease Detector
> Why I’m a Fit
I am a self-taught ML developer focused on building resilient Python and Javascript systems. My background prepares me for the Lusophone Wishlist Project in three specific areas:
1. My experience with complex data structures enables me to integrate the Wikidata API and automate Contest Scoring (Wishlist #8), specifically optimizing WikiScore to track Wikidata edits.
2. I am prepared to develop the Duplicate Reference checker (Wishlist #3) for the Visual Editor, by applying my experience with messy data to ensure the logic remains flawless even with weird inputs.
3. I am proficient in Python for backend logic in Task 2 and JavaScript, HTML and CSS skills demonstrated in Task 1.
> Goal
My goal is to deliver clean and usable code that simplifies technical workflows for the Lusophone community.
7. Community Impact & Benefits
Fulfilling these community wishes will make a real difference by:
1. Stopping the same links from being added twice, which keeps the article’s code tidy and makes it much easier for everyone to read and fix later.
2. Acting like a built-in assistant that catches small mistakes before they happen, giving new editors the confidence to contribute without fear of messing up.
3. Automating Wikidata scores to ensure every editor’s hard work and Portuguese-specific edits are valued fairly during contests to leave no one behind.
4. Saving editors from wasting time hunting for the same reference over and over, so they can focus on writing great content.
5. Providing tools that are built right into the editor and localized for the Portuguese community, meaning no one has to learn complicated new software to see the benefits.
8. Post-Contribution Phase Tasks
After submitting my final application on April 15, 2026 4pm UTC, I chose to continue my momentum by building live prototypes for the wishlist proposals.
>Task 1: WikiScore-Lusofonia (Wish #8)
- Live Prototype: Live Wikiscore
- Github Repository: View Source Code
- Technical README: Logic & Math Documentation
- Analysis Blog: View Blog
>Contribution Overview:
1. I designed a full system from scratch to solve Lusophone Wishlist #8.
2. This tool helps the community count and score Wikidata edits.
3. I created a Dual-Path system that shows the difference between Global edits and Portuguese edits.
4. I used a special search rule (regex) to make sure it counts both European and Brazilian Portuguese.
5. To keep people excited, I added a 6-level badge system for healthy competition.
6. These badges are only earned from Portuguese scores to keep everyone focused on the main goal.
>Technical Stack:
- Python 3.13+, Django (Backend), Html/CSS, JavaScript, Tailwind CSS, and Wikidata API.
>Key Technical Innovations:
1. I used a concept similar to the Retrieval part of AI systems. My tool isn't a full RAG system,
but it uses the same idea; finding and fetching raw data from a huge database (Wikidata) very fast for many users at once.
2. I used parallel processing to scan 20 users at the same time.
This keeps the system fast and responsive, so organizers don't have to wait a long time for the results.
3. I used BigIntegerField in Django models to ensure the
system handles billions of Wikidata edits without crashing.
4. I also built a Strict-Mode (Anti-Cheat) indicator and
a Live QID Tracker so organizers can ensure all points are earned fairly.
5. My engine uses a recursive loop to keep flipping the page
until it finds every single edit a user has made, even if they have thousands.
6. I have optimized the scoring logic so it can be called directly by the existing CounterHandler.get_points() method.
(This ensures that the new Wikidata points are injected into the leaderboard
calculation without requiring any structural changes to the existing frontend)
7. Since many people use phones, I made a Swipeable Table with a
neon-green pulsing hint to show users how to find their scores easily.
8. I developed a mathematical model to ensure that high-value contributions (like adding References or Images) are rewarded more than simple metadata updates.
This formula prioritizes the quality of edits over just the quantity:
9. Timeline
I. Pre-Internship (April 30 – May 17)
1. I've completed MediaWiki, Gerrit, and Zulip setup; and subscribed mailing lists.
2. I’ll introduce myself to the Lusophone and technical teams to start collaboration.
3. I’ll use Phabricator to break Wish #3 & 8 into tasks with weekly tickets for mentor tracking.
4. I’ll study Visual Editor (JS) and WikiScore (Python) codebases so I can start contributing from Day 1.
II. The Internship Period (May 18 – August 17)
> Month 1: [May 18 – June 16] (Wish #3)
| Time Period | Technical Tasks | Community Tasks |
|---|---|---|
| Week 1 (May 18 – May 24) [7 days] | 1. I'll finish MediaWiki-Docker setup and map citation insertion in the editor. 2. I'll trace ve.ui.CitoidInspector.js to understand the citation workflow. 3. I'll identify the JS hooks and entry points needed for duplicate checking. | (I’ll summarize achievements and hurdles, review skills with mentors, set goals, and 1st blog.) |
| Week 2 (May 25 – May 31) [7 days] | 1. I'll build a JS utility to normalize ISBNs, DOIs and URLs for accurate checking. 2. I'll add logic to handle extra slashes, protocol differences, and case sensitivity. 3. I'll write unit tests to ensure these cleaning functions are reliable. | (I’ll share progress and solutions, review skills with mentors, set Week 3 goals, and 2nd blog.) |
| Week 3 (June 1 – June 7) [7 days] | 1. I'll build an engine to scan the InternalList for matching IDs without slowing the editor. 2. I'll optimize the search process for long articles with 100+ references. 3. I'll develop a type-ahead engine to suggest existing sources while the user types. 4. I'll connect this comparison logic directly into the citation entry process. | (I’ll share progress and hurdles, review skills with mentors, set Week 4 goals, and 3rd blog.) |
| Week 4 (June 8 – June 16) [9 days] | 1. I'll use OOUI to build a Duplicate Found notification and source consolidation tool. 2. I'll integrate jquery.i18n with mw.msg() for easy Portuguese translation. 3. I'll test logic against Lusophone templates like Citar livro for format accuracy. 4. I'll stage code for Gerrit and organize JSON files for my first Patch Set. 5. I'll design a Reuse option to stop redundant entries and fix Month 1 bugs. | (I'll review Month 1 milestones and UI hurdles, set Month 2 goals, and 4th blog.) |
> Month 2: [June 17 – July 16] (Wish #8)
| Time Period | Technical Tasks | Community Tasks |
|---|---|---|
| Week 1 (June 17 – June 23) [7 days] | 1. I'll analyze WikiScore Python & Django and Wikidata API to draft a technical design doc. 2. I'll study RecentChanges and wbgetentities to map data flow. 3. I'll draft the architectural plan for unified Wikidata scoring. | (I’ll review architectural achievements, discuss Python/API skills with mentors, and 5th blog.) |
| Week 2 (June 24 – June 30) [7 days] | 1. I'll write Python modules to fetch Wikidata contributions while respecting rate limits. 2. I'll add logic to prevent API blocking and parse JSON into the WikiScore model. | (I’ll share API achievements and rate-limit challenges, review skills with mentors, and 6th blog.) |
| Week 3 (July 1 – July 7) [7 days] | 1. I'll develop logic to award points for Wikidata actions like Claims and Labels for the scoring dashboard. 2. I'll prioritize Portuguese labels & descriptions to reward Lusophone-specific work. 3. I'll integrate these scores into the existing WikiScore interface. | (I’ll review logic milestones, document weighting issues, discuss scoring fairness and Django skills with mentors, and 7th blog.) |
| Week 4 (July 8 – July 16) [9 days] | 1. I'll test WikiScore using sample data from the Lusophone community to make sure the points are calculated correctly. 2. I'll hunt for bugs and fix any data errors found in complex Wikidata edits. 3. I'll validate tool stability using real data from past contests. | (I'll summarize Month 2 achievements, discuss testing with mentors, and 8th blog.) |
> Month 3: [July 17 – August 17] Final Polish
| Time Period | Technical Tasks | Community Tasks |
|---|---|---|
| Week 1 (July 17 – July 23) [7 days] | 1. I'll stress-test the Duplicate Checker (500+ refs) and optimize Python queries. 2. I'll perform a full code audit for security and logic errors. | (I’ll share performance stats, discuss tuning with mentors, and 9th blog.) |
| Week 2 (July 24 – July 30) [7 days] | 1. I'll finalize MediaWiki.org technical docs and clean code to community standards. 2. I'll submit Gerrit patches and write guides for future maintainers. | (I’ll review milestones and writing challenges, set review goals, and 10th blog.) |
| Week 3 (July 31 – Aug 6) [7 days] | 1. I'll run Lusophone community beta testing and prioritize bug reports. 2. I'll fix minor bugs identified during review to ensure user satisfaction. | (I’ll review engagement and feedback, get final stability feedback, and 11th blog.) |
| Week 4 (Aug 7 – Aug 17) [11 days] | 1. I'll finish Outreachy evaluations and map work to Wishes #17 and #192. 2. I'll ensure project integration into the Wikimedia ecosystem. | (I’ll reflect on project skills, review growth with mentors, and publish my final blog.) |
III. Post-Internship (August 17 onwards)
After successfully implementing Wishes #3 and #8 For the Lusophone community, my primary goal is to remain an active contributor. I plan to:
1. Bridging my logic to Wish #17 to improve automatic reference naming.
2. Supporting Wish #192 using WikiScore insights to reduce Wikidata redundancy.
3. I’ll apply for mentorship in future rounds to help new developers.
4. I’ll act as a Wikimedia Ambassador in Pakistan. I’ll focus on bringing more Pakistani developers into the ecosystem and promoting open source.