Page MenuHomePhabricator

Proposal: Addressing the Lusophone Technological Wishlist Proposals Project (Wikidata / Wikiscore)
Closed, DeclinedPublic

Description

Profile Information

Name: Ayush Khati
GitHub: https://github.com/AyushkhatiDev
Portfolio: https://portfoliokhati.netlify.app/
Location: India (West Bengal, IST – UTC+5:30)

I am a backend-focused software developer with hands-on experience in designing and building scalable systems, APIs, and data processing pipelines. I have worked on production-grade applications involving performance optimization, caching, and distributed architectures. My work includes developing reliable systems with a focus on efficiency, consistency, and maintainability. I am particularly interested in building data-driven tools that support real-world use cases and large-scale collaboration..

As part of this application, I completed both Outreachy microtasks (T418285 and T418286) and improved them based on mentor feedback.


Synopsis

The Lusophone Technological Wishlist is a community-driven initiative aimed at identifying improvements that enhance the experience of contributors across Wikimedia projects.

For this internship, I propose to work on Wishlist #8: adding Wikidata support to the Wikiscore tool.

This feature will allow Wikiscore to track and evaluate Wikidata contributions, enabling edit-a-thons and contests to include Wikidata edits. It expands the scope of the tool and supports a broader contributor base.


Selected Wishlist

Wishlist #8: Wikidata support for Wikiscore

Currently, Wikiscore primarily focuses on Wikipedia edits. However, Wikidata contributions are increasingly important and should be included in scoring systems to support structured data contributions.

The goal is to:

  • Retrieve Wikidata user contributions using relevant APIs
  • Parse, normalize, and validate contribution data
  • Integrate Wikidata edits into the existing Wikiscore scoring pipeline
  • Ensure accurate, consistent, and efficient computation of scores across different contribution types

Technical Approach

The implementation will follow MediaWiki development practices and integrate cleanly with the existing Wikiscore architecture, ensuring maintainability and compatibility with Wikimedia tooling.

  1. Data Retrieval
  2. Use Wikidata APIs (e.g., recent changes or user contribution endpoints) to fetch user edits
  3. Handle pagination, rate limits, and API constraints
  4. Ensure reliability with proper error handling
  1. Data Processing
  2. Parse and normalize contribution data
  3. Filter relevant edit types
  4. Handle edge cases such as duplicate entries, reverted edits, minor edits, and incomplete metadata
  1. Scoring Logic
  2. Define scoring rules for Wikidata edits
  3. Ensure compatibility with existing Wikiscore logic
  4. Maintain consistency across different contribution types
  1. Integration
  2. Integrate Wikidata data into the existing Wikiscore pipeline
  3. Ensure modular, testable, and maintainable code
  4. Use caching (e.g., Redis) to optimize repeated queries
  1. Performance Considerations
  2. Optimize API calls using batching strategies
  3. Implement caching to reduce redundant computations
  4. Consider API rate limiting and throttling constraints
  5. Ensure scalability for large datasets and high user activity

Timeline

Week 1 (May 18 – May 24):
Set up development environment, run Wikiscore locally, and review project architecture. Explore Wikidata APIs and identify relevant endpoints for user contributions.

Week 2 (May 25 – May 31):
Analyze data flow in Wikiscore, define integration points for Wikidata, and finalize implementation plan with mentors.

Week 3 (June 1 – June 7):
Implement initial data fetching layer using Wikidata APIs, including basic request handling and response parsing.

Week 4 (June 8 – June 14):
Extend data retrieval to support pagination, error handling, and rate limiting. Validate data consistency.

Week 5 (June 15 – June 21):
Design scoring logic for Wikidata edits and implement core scoring functions.

Week 6 (June 22 – June 28):
Integrate scoring logic with fetched data and ensure compatibility with existing Wikiscore system.

Week 7 (June 29 – July 5):
Integrate Wikidata contribution pipeline into Wikiscore backend and ensure correct data flow.

Week 8 (July 6 – July 12):
Optimize performance using caching (e.g., Redis) and batching API requests.

Week 9 (July 13 – July 19):
Test system with real-world datasets and validate scoring accuracy.

Week 10 (July 20 – July 26):
Handle edge cases such as reverted edits, duplicate entries, and incomplete metadata.

Week 11 (July 27 – August 2):
Write technical documentation and document design decisions.

Week 12 (August 3 – August 9):
Refactor code, improve test coverage, and incorporate mentor feedback.

Week 13 (August 10 – August 17):
Finalize implementation, perform end-to-end testing, and prepare for submission and handover.

Impact

This feature will enable organizers to include Wikidata contributions in edit-a-thons and contests, improving participation and recognition.

It will also make Wikiscore more versatile and inclusive, supporting a broader range of Wikimedia contributions and encouraging engagement with structured data.


Why Me

My experience in backend development, API design, and data processing aligns well with this project.

I have worked on systems involving:

  • REST APIs and large-scale data processing
  • Performance optimization using caching and indexing
  • Asynchronous workflows and distributed systems

I am comfortable working with complex data pipelines and ensuring reliability and scalability.

I have also worked on research-driven projects, including Physics-Informed Neural Networks (PINNs) and an AI-powered data extraction system. These projects involved building scalable pipelines, working with structured data, and designing efficient processing systems.

Research Work:

I am comfortable iterating based on feedback and delivering incremental improvements, which is essential for contributing effectively to open-source projects like Wikimedia.


Post-Internship Contribution

I plan to continue contributing to Wikimedia by:

  • Maintaining and improving the Wikidata integration within Wikiscore, including fixing issues and optimizing performance
  • Extending support for additional Wikidata-related features based on community needs
  • Contributing to related Wikimedia tools and improving existing workflows where applicable
  • Assisting new contributors by sharing knowledge, improving documentation, and supporting onboarding efforts

I aim to remain an active contributor by continuously improving the reliability and scalability of tools that support Wikimedia communities.

Thank you for your consideration.

Event Timeline

Hello @Arcstur and @Ederporto,

I have created a detailed proposal outlining my approach for implementing Wikidata support in Wikiscore:

https://phabricator.wikimedia.org/T423003

The proposal covers the technical design, integration strategy, and timeline. I would appreciate any feedback or suggestions.

Thank you for your guidance.

Aklapper updated Other Assignee, removed: Ederporto.
Aklapper moved this task from Backlog to Pending Intern Proposals on the Outreachy (Round 32) board.
Aklapper added a subscriber: Arcstur.
Gopavasanth subscribed.

Thank you for your proposal and the effort you put into it. This year we received over 20 strong applications, and after a highly competitive review, we were unfortunately unable to offer you a slot.

Please don't see this as a failure, many contributors who weren't selected for Outreachy have gone on to make meaningful, lasting impact in the Wikimedia community, and we genuinely hope you'll stay engaged. You're very welcome to continue contributing outside of Outreachy. Our mentors and org admins are happy to help you get started or keep going:

We hope to see you around in the community.