Profile
Name: Payal Sumbhe
Username: PayalS3
GitHub: https://github.com/payalrvs3
Location: Thane, Maharashtra, India
Timezone: UTC+5:30 (IST)
Ideal working hours: 9:00 AM – 5:00 PM IST (40 hours per week) (Flexible)
Synopsis
The Lusophone Technological Wishlist is a community-driven survey among editors, readers, and researchers of Wikimedia projects in Portuguese, aimed at identifying and prioritizing improvements that would meaningfully benefit this community. It connects to community wishes 17 and 192, the Contributor Experiences (WE1) subbucket of the WMF Annual Plan, and specifically to Lusophone wishlist wishes 3 and 8.
For this internship I propose to implement Wish #8: Wikidata support for WikiScore, allowing the Lusophone community to run edit-a-thons and contests that include Wikidata contributions alongside Wikipedia edits.
Problem Statement
WikiScore is a Django application built by WikiMovimento Brasil to manage edit-a-thons and wikicontests on Portuguese Wikipedia. It allows evaluators to validate article edits and score participants based on bytes contributed and pictures added.
Currently WikiScore only tracks edits made to Portuguese Wikipedia articles. However, Wikidata has become an increasingly important part of Wikimedia edit-a-thons, participants contribute by adding statements, references, labels, and sitelinks to Wikidata items, and none of these contributions are counted. Contest organizers must track Wikidata contributions manually, which is slow, error-prone, and discourages structured data contribution.
Mentors
Contributions
- GitHub repository: https://github.com/payalrvs3/outreachy-2026
- Task 1 (JavaScript - T418285): Task 1.html
- Task 2 (Python - T418286): Task 2
- Prototype - Wikidata support for WikiScore (live demo): https://payalrvs3.github.io/outreachy-2026/Prototype/wishlist8_prototype.html
Technical Analysis
After studying the WikiScore codebase I have a clear understanding of the exact changes needed.
Current update pipeline (update.py):
steps = ["load_edits", "load_users", "load_reverts"]
Each step is a standalone Django management command called via call_command(). The entire Wikidata integration fits into this pattern by adding one new command - load_wikidata.
Current scoring formula (from CounterHandler.get_points()):
bytes_points = FLOOR(SUM(real_bytes per article, capped) / bytes_per_points)
pictures_points = FLOOR(SUM(pictures) / pictures_per_points)
total_points = bytes_points + pictures_points
Proposed updated formula:
wikidata_points = SUM(WikidataEdit.points per participant)
total_points = bytes_points + pictures_points + wikidata_points
How Wikidata edit types are identified:
Wikibase auto-generates edit comments with machine-readable prefixes:
/* wbsetclaim-create:2||1 / added a statement
/ wbsetreference-add:1| / added a reference
/ wbcreate-new */
A regex parser extracts the action type and maps it to points configured per contest through a new WikidataPointRule model.
Proposed Implementation
New fields on the existing Contest model:
wikidata_enabled = models.BooleanField(default=False) wikidata_exclude_bots = models.BooleanField(default=True) wikidata_linked_only = models.BooleanField(default=False)
Two new models:
class WikidataPointRule(models.Model): contest = models.ForeignKey('Contest', on_delete=models.CASCADE) action_type = models.CharField(max_length=50) points = models.SmallIntegerField(default=0) class Meta: unique_together = ['contest', 'action_type'] class WikidataEdit(models.Model): contest = models.ForeignKey('Contest', on_delete=models.CASCADE) participant = models.ForeignKey('Participant', on_delete=models.SET_NULL, null=True) revid = models.IntegerField() item = models.CharField(max_length=20) action_type = models.CharField(max_length=50) comment = models.TextField(blank=True, default='') points = models.SmallIntegerField(default=0) timestamp = models.DateTimeField() class Meta: unique_together = ['contest', 'revid']
New load_wikidata management command following the exact pattern of load_edits.py and load_reverts.py, fetching from wikidata.org/w/api.php using action=query&list=usercontribs per participant within the contest time window.
One line change to update.py:
steps = ["load_edits", "load_users", "load_reverts", "load_wikidata"]
Updated CounterHandler to include wikidata_points in the scoring query and leaderboard display.
Updated contest management form so organizers can enable Wikidata scoring and configure point rules per edit type.
Timeline
The internship runs from May 18, 2026 to August 17, 2026 (13 weeks, 40 hours/week). This timeline is flexible and will be refined with mentors during the bonding period. I will send weekly progress reports every Friday.
| Period | Technical Tasks | Community Tasks |
|---|---|---|
| Week 1 May 18–24 | Kick-off meeting with mentors. Set up local WikiScore development environment. Review full codebase with mentor guidance. Clarify scope, PR workflow, and contribution guidelines. | Blog Post 1: Introduction and project goals. First weekly report. |
| Week 2 May 25–31 | Study Wikidata usercontribs API, experiment with responses, pagination, edit comment formats across all Wikibase action types. Document findings before writing code. | Align with mentors on code review workflow. Weekly report. |
| Week 3 Jun 1–7 | Design and implement WikidataEdit and WikidataPointRule models. Add three new fields to Contest model. Write and run database migrations. Open first PR for mentor review. | Blog Post 2: Understanding the WikiScore architecture. Weekly report. |
| Week 4 Jun 8–14 | Write the load_wikidata management command following the pattern of load_edits.py. Implement Wikibase edit comment parser, pagination, bot filtering, and duplicate prevention via unique_together. | Code review session with mentors. Weekly report. |
| Week 5 Jun 15–21 | Add load_wikidata to the update.py pipeline. Test end-to-end with real participant usernames and a real contest time window against the live Wikidata API. Fix issues found. | Blog Post 3: Building for the Lusophone community. Weekly report. |
| Week 6 Jun 22–28 | Integrate wikidata_points into CounterHandler scoring query. Verify total_points = bytes_points + pictures_points + wikidata_points. Ensure Wikipedia-only contests are unaffected when wikidata_enabled=False. | Integration testing session with mentors. Weekly report. |
| Week 7 Jun 29–Jul 5 | Add Wikidata configuration fields to contest management form in manage.py and Django admin panel. Allow organizers to enable Wikidata scoring and set points per edit type. | Midpoint progress blog post. Weekly report. |
| Week 8 Jul 6–12 | Write comprehensive tests for new models, management command, API calls, comment parser, and updated scoring logic. Ensure all existing tests still pass. | Mid-internship feedback session with mentors. Weekly report. |
| Week 9 Jul 13–19 | Address all mentor feedback from weeks 3–8. Handle edge cases: participants with no Wikidata edits, wikidata_linked_only mode, API timeouts, and reverted edits. | Blog Post 4: What open source contribution taught me. Weekly report. |
| Week 10 Jul 20–26 | Write developer documentation for new models, management command, and scoring logic. Write user-facing documentation for contest organizers explaining how to enable and configure Wikidata scoring. | Community review with mentors and Lusophone community members. Weekly report. |
| Week 11 Jul 27–Aug 2 | Test the feature against a real or past contest dataset with mentors. Make adjustments based on real usage feedback. Ensure leaderboard display is clear and useful for organizers. | Blog Post 5: Testing with real community data. Weekly report. |
| Week 12 Aug 3–9 | Final polish and optimization. Fix any remaining issues. Ensure all PRs are reviewed and merged. Clean up code to meet WikiMovimento Brasil's contribution standards. | Final code review session with mentors. Weekly report. |
| Week 13 Aug 10–17 | Final submission and handover. Prepare handover notes covering what was built, known limitations, and ideas for future improvements. Ensure all documentation is public and accurate. | Final blog post summarizing the internship. Final weekly report. |
Why Me
WikiScore is a Django application that fetches data from an external API, parses it, and feeds it into a scoring pipeline. That is exactly the kind of system I have built before.
My most relevant project is Lectara, an open-source AI-powered transcription tool built with FastAPI and Python. It fetches audio, calls an external AI API (Gemini), processes the response through a multi-stage pipeline, and outputs structured notes. Working on Lectara taught me how to design clean pipelines, handle API pagination and timeouts, and use multi-threading for performance. All of these skills apply directly to building the load_wikidata command.
I also built a railway booking system(IRCTC) in Django with user authentication and database management, so I am familiar with Django models, migrations, and management commands. That is the exact layer where most of this project's changes live.
My open source experience comes from Hacktoberfest, where I learned to work inside an existing codebase, respond to review feedback, and contribute incrementally.
What I care about most is that this project has a real community depending on it. WikiScore helps Portuguese-language editors run fair, accurate contests. Adding Wikidata support means their structured data contributions finally count, and that is worth building carefully.
Post-Internship Contributions
My contribution to the Wikimedia ecosystem will not end in August. I plan to monitor the Wikidata integration for issues caused by upstream Wikidata API changes, provide support to Lusophone organizers using WikiScore, and continue contributing to WikiMovimento Brasil tools.
If time permits during the internship I would also like to begin exploring Wish #3 - automatic duplicate reference detection in the Visual Editor, as a stretch goal in the final weeks. This would involve studying the Visual Editor codebase and the Citoid extension to understand the correct integration point, with the goal of either starting an implementation or producing a well-documented technical plan that a future contributor could build on.
I also plan to mentor future Outreachy applicants and share my experience navigating the Wikimedia codebase as a new contributor.