Page MenuHomePhabricator

Proposal: Addressing the Lusophone Technological Wishlist Proposal - Wikidata
Closed, DeclinedPublic

Description

Profile

Name: Payal Sumbhe
Username: PayalS3
GitHub: https://github.com/payalrvs3
Location: Thane, Maharashtra, India
Timezone: UTC+5:30 (IST)
Ideal working hours: 9:00 AM – 5:00 PM IST (40 hours per week) (Flexible)

Synopsis

The Lusophone Technological Wishlist is a community-driven survey among editors, readers, and researchers of Wikimedia projects in Portuguese, aimed at identifying and prioritizing improvements that would meaningfully benefit this community. It connects to community wishes 17 and 192, the Contributor Experiences (WE1) subbucket of the WMF Annual Plan, and specifically to Lusophone wishlist wishes 3 and 8.

For this internship I propose to implement Wish #8: Wikidata support for WikiScore, allowing the Lusophone community to run edit-a-thons and contests that include Wikidata contributions alongside Wikipedia edits.

Problem Statement

WikiScore is a Django application built by WikiMovimento Brasil to manage edit-a-thons and wikicontests on Portuguese Wikipedia. It allows evaluators to validate article edits and score participants based on bytes contributed and pictures added.

Currently WikiScore only tracks edits made to Portuguese Wikipedia articles. However, Wikidata has become an increasingly important part of Wikimedia edit-a-thons, participants contribute by adding statements, references, labels, and sitelinks to Wikidata items, and none of these contributions are counted. Contest organizers must track Wikidata contributions manually, which is slow, error-prone, and discourages structured data contribution.

Mentors

Contributions

Technical Analysis

After studying the WikiScore codebase I have a clear understanding of the exact changes needed.

Current update pipeline (update.py):

steps = ["load_edits", "load_users", "load_reverts"]

Each step is a standalone Django management command called via call_command(). The entire Wikidata integration fits into this pattern by adding one new command - load_wikidata.

Current scoring formula (from CounterHandler.get_points()):
bytes_points = FLOOR(SUM(real_bytes per article, capped) / bytes_per_points)
pictures_points = FLOOR(SUM(pictures) / pictures_per_points)
total_points = bytes_points + pictures_points

Proposed updated formula:
wikidata_points = SUM(WikidataEdit.points per participant)
total_points = bytes_points + pictures_points + wikidata_points

How Wikidata edit types are identified:

Wikibase auto-generates edit comments with machine-readable prefixes:
/* wbsetclaim-create:2||1 / added a statement
/ wbsetreference-add:1| / added a reference
/ wbcreate-new */

A regex parser extracts the action type and maps it to points configured per contest through a new WikidataPointRule model.

Proposed Implementation

New fields on the existing Contest model:

wikidata_enabled      = models.BooleanField(default=False)
wikidata_exclude_bots = models.BooleanField(default=True)
wikidata_linked_only  = models.BooleanField(default=False)

Two new models:

class WikidataPointRule(models.Model):
    contest     = models.ForeignKey('Contest', on_delete=models.CASCADE)
    action_type = models.CharField(max_length=50)
    points      = models.SmallIntegerField(default=0)
    class Meta:
        unique_together = ['contest', 'action_type']

class WikidataEdit(models.Model):
    contest     = models.ForeignKey('Contest', on_delete=models.CASCADE)
    participant = models.ForeignKey('Participant', on_delete=models.SET_NULL, null=True)
    revid       = models.IntegerField()
    item        = models.CharField(max_length=20)
    action_type = models.CharField(max_length=50)
    comment     = models.TextField(blank=True, default='')
    points      = models.SmallIntegerField(default=0)
    timestamp   = models.DateTimeField()
    class Meta:
        unique_together = ['contest', 'revid']

New load_wikidata management command following the exact pattern of load_edits.py and load_reverts.py, fetching from wikidata.org/w/api.php using action=query&list=usercontribs per participant within the contest time window.

One line change to update.py:

steps = ["load_edits", "load_users", "load_reverts", "load_wikidata"]

Updated CounterHandler to include wikidata_points in the scoring query and leaderboard display.

Updated contest management form so organizers can enable Wikidata scoring and configure point rules per edit type.

Timeline

The internship runs from May 18, 2026 to August 17, 2026 (13 weeks, 40 hours/week). This timeline is flexible and will be refined with mentors during the bonding period. I will send weekly progress reports every Friday.

PeriodTechnical TasksCommunity Tasks
Week 1 May 18–24Kick-off meeting with mentors. Set up local WikiScore development environment. Review full codebase with mentor guidance. Clarify scope, PR workflow, and contribution guidelines.Blog Post 1: Introduction and project goals. First weekly report.
Week 2 May 25–31Study Wikidata usercontribs API, experiment with responses, pagination, edit comment formats across all Wikibase action types. Document findings before writing code.Align with mentors on code review workflow. Weekly report.
Week 3 Jun 1–7Design and implement WikidataEdit and WikidataPointRule models. Add three new fields to Contest model. Write and run database migrations. Open first PR for mentor review.Blog Post 2: Understanding the WikiScore architecture. Weekly report.
Week 4 Jun 8–14Write the load_wikidata management command following the pattern of load_edits.py. Implement Wikibase edit comment parser, pagination, bot filtering, and duplicate prevention via unique_together.Code review session with mentors. Weekly report.
Week 5 Jun 15–21Add load_wikidata to the update.py pipeline. Test end-to-end with real participant usernames and a real contest time window against the live Wikidata API. Fix issues found.Blog Post 3: Building for the Lusophone community. Weekly report.
Week 6 Jun 22–28Integrate wikidata_points into CounterHandler scoring query. Verify total_points = bytes_points + pictures_points + wikidata_points. Ensure Wikipedia-only contests are unaffected when wikidata_enabled=False.Integration testing session with mentors. Weekly report.
Week 7 Jun 29–Jul 5Add Wikidata configuration fields to contest management form in manage.py and Django admin panel. Allow organizers to enable Wikidata scoring and set points per edit type.Midpoint progress blog post. Weekly report.
Week 8 Jul 6–12Write comprehensive tests for new models, management command, API calls, comment parser, and updated scoring logic. Ensure all existing tests still pass.Mid-internship feedback session with mentors. Weekly report.
Week 9 Jul 13–19Address all mentor feedback from weeks 3–8. Handle edge cases: participants with no Wikidata edits, wikidata_linked_only mode, API timeouts, and reverted edits.Blog Post 4: What open source contribution taught me. Weekly report.
Week 10 Jul 20–26Write developer documentation for new models, management command, and scoring logic. Write user-facing documentation for contest organizers explaining how to enable and configure Wikidata scoring.Community review with mentors and Lusophone community members. Weekly report.
Week 11 Jul 27–Aug 2Test the feature against a real or past contest dataset with mentors. Make adjustments based on real usage feedback. Ensure leaderboard display is clear and useful for organizers.Blog Post 5: Testing with real community data. Weekly report.
Week 12 Aug 3–9Final polish and optimization. Fix any remaining issues. Ensure all PRs are reviewed and merged. Clean up code to meet WikiMovimento Brasil's contribution standards.Final code review session with mentors. Weekly report.
Week 13 Aug 10–17Final submission and handover. Prepare handover notes covering what was built, known limitations, and ideas for future improvements. Ensure all documentation is public and accurate.Final blog post summarizing the internship. Final weekly report.

Why Me

WikiScore is a Django application that fetches data from an external API, parses it, and feeds it into a scoring pipeline. That is exactly the kind of system I have built before.

My most relevant project is Lectara, an open-source AI-powered transcription tool built with FastAPI and Python. It fetches audio, calls an external AI API (Gemini), processes the response through a multi-stage pipeline, and outputs structured notes. Working on Lectara taught me how to design clean pipelines, handle API pagination and timeouts, and use multi-threading for performance. All of these skills apply directly to building the load_wikidata command.

I also built a railway booking system(IRCTC) in Django with user authentication and database management, so I am familiar with Django models, migrations, and management commands. That is the exact layer where most of this project's changes live.

My open source experience comes from Hacktoberfest, where I learned to work inside an existing codebase, respond to review feedback, and contribute incrementally.

What I care about most is that this project has a real community depending on it. WikiScore helps Portuguese-language editors run fair, accurate contests. Adding Wikidata support means their structured data contributions finally count, and that is worth building carefully.

Post-Internship Contributions

My contribution to the Wikimedia ecosystem will not end in August. I plan to monitor the Wikidata integration for issues caused by upstream Wikidata API changes, provide support to Lusophone organizers using WikiScore, and continue contributing to WikiMovimento Brasil tools.

If time permits during the internship I would also like to begin exploring Wish #3 - automatic duplicate reference detection in the Visual Editor, as a stretch goal in the final weeks. This would involve studying the Visual Editor codebase and the Citoid extension to understand the correct integration point, with the goal of either starting an implementation or producing a well-documented technical plan that a future contributor could build on.

I also plan to mentor future Outreachy applicants and share my experience navigating the Wikimedia codebase as a new contributor.

Event Timeline

PayalS3 renamed this task from Proposal: Addressing the Lusophone Technological Wishlist Proposal - Payals3 to Proposal: Addressing the Lusophone Technological Wishlist Proposal - Wikidata.Wed, Apr 15, 11:09 AM

Hi mentors (@Arcstur @Ederporto),
I wanted to share a quick update on my contributions.

I have updated my Wish #8 prototype to include a live API lookup tab that calls the real Wikidata usercontribs API, parses Wikibase action types from edit comments using regex, and scores edits using the point rules from the settings panel. You can try it with any real Wikidata username:
Wish#8-Prototype

Best,
Payal (payalrvs3)

Gopavasanth subscribed.

Thank you for your proposal and the effort you put into it. This year we received over 20 strong applications, and after a highly competitive review, we were unfortunately unable to offer you a slot.

Please don't see this as a failure, many contributors who weren't selected for Outreachy have gone on to make meaningful, lasting impact in the Wikimedia community, and we genuinely hope you'll stay engaged. You're very welcome to continue contributing outside of Outreachy. Our mentors and org admins are happy to help you get started or keep going:

We hope to see you around in the community.