Profile Information
Name: Ayush Khati
GitHub: https://github.com/AyushkhatiDev
Portfolio: https://portfoliokhati.netlify.app/
Location: India (West Bengal, IST – UTC+5:30)
I am a backend-focused software developer with hands-on experience in designing and building scalable systems, APIs, and data processing pipelines. I have worked on production-grade applications involving performance optimization, caching, and distributed architectures. My work includes developing reliable systems with a focus on efficiency, consistency, and maintainability. I am particularly interested in building data-driven tools that support real-world use cases and large-scale collaboration..
As part of this application, I completed both Outreachy microtasks (T418285 and T418286) and improved them based on mentor feedback.
Synopsis
The Lusophone Technological Wishlist is a community-driven initiative aimed at identifying improvements that enhance the experience of contributors across Wikimedia projects.
For this internship, I propose to work on Wishlist #8: adding Wikidata support to the Wikiscore tool.
This feature will allow Wikiscore to track and evaluate Wikidata contributions, enabling edit-a-thons and contests to include Wikidata edits. It expands the scope of the tool and supports a broader contributor base.
Selected Wishlist
Wishlist #8: Wikidata support for Wikiscore
Currently, Wikiscore primarily focuses on Wikipedia edits. However, Wikidata contributions are increasingly important and should be included in scoring systems to support structured data contributions.
The goal is to:
- Retrieve Wikidata user contributions using relevant APIs
- Parse, normalize, and validate contribution data
- Integrate Wikidata edits into the existing Wikiscore scoring pipeline
- Ensure accurate, consistent, and efficient computation of scores across different contribution types
Technical Approach
The implementation will follow MediaWiki development practices and integrate cleanly with the existing Wikiscore architecture, ensuring maintainability and compatibility with Wikimedia tooling.
- Data Retrieval
- Use Wikidata APIs (e.g., recent changes or user contribution endpoints) to fetch user edits
- Handle pagination, rate limits, and API constraints
- Ensure reliability with proper error handling
- Data Processing
- Parse and normalize contribution data
- Filter relevant edit types
- Handle edge cases such as duplicate entries, reverted edits, minor edits, and incomplete metadata
- Scoring Logic
- Define scoring rules for Wikidata edits
- Ensure compatibility with existing Wikiscore logic
- Maintain consistency across different contribution types
- Integration
- Integrate Wikidata data into the existing Wikiscore pipeline
- Ensure modular, testable, and maintainable code
- Use caching (e.g., Redis) to optimize repeated queries
- Performance Considerations
- Optimize API calls using batching strategies
- Implement caching to reduce redundant computations
- Consider API rate limiting and throttling constraints
- Ensure scalability for large datasets and high user activity
Timeline
Week 1 (May 18 – May 24):
Set up development environment, run Wikiscore locally, and review project architecture. Explore Wikidata APIs and identify relevant endpoints for user contributions.
Week 2 (May 25 – May 31):
Analyze data flow in Wikiscore, define integration points for Wikidata, and finalize implementation plan with mentors.
Week 3 (June 1 – June 7):
Implement initial data fetching layer using Wikidata APIs, including basic request handling and response parsing.
Week 4 (June 8 – June 14):
Extend data retrieval to support pagination, error handling, and rate limiting. Validate data consistency.
Week 5 (June 15 – June 21):
Design scoring logic for Wikidata edits and implement core scoring functions.
Week 6 (June 22 – June 28):
Integrate scoring logic with fetched data and ensure compatibility with existing Wikiscore system.
Week 7 (June 29 – July 5):
Integrate Wikidata contribution pipeline into Wikiscore backend and ensure correct data flow.
Week 8 (July 6 – July 12):
Optimize performance using caching (e.g., Redis) and batching API requests.
Week 9 (July 13 – July 19):
Test system with real-world datasets and validate scoring accuracy.
Week 10 (July 20 – July 26):
Handle edge cases such as reverted edits, duplicate entries, and incomplete metadata.
Week 11 (July 27 – August 2):
Write technical documentation and document design decisions.
Week 12 (August 3 – August 9):
Refactor code, improve test coverage, and incorporate mentor feedback.
Week 13 (August 10 – August 17):
Finalize implementation, perform end-to-end testing, and prepare for submission and handover.
Impact
This feature will enable organizers to include Wikidata contributions in edit-a-thons and contests, improving participation and recognition.
It will also make Wikiscore more versatile and inclusive, supporting a broader range of Wikimedia contributions and encouraging engagement with structured data.
Why Me
My experience in backend development, API design, and data processing aligns well with this project.
I have worked on systems involving:
- REST APIs and large-scale data processing
- Performance optimization using caching and indexing
- Asynchronous workflows and distributed systems
I am comfortable working with complex data pipelines and ensuring reliability and scalability.
I have also worked on research-driven projects, including Physics-Informed Neural Networks (PINNs) and an AI-powered data extraction system. These projects involved building scalable pipelines, working with structured data, and designing efficient processing systems.
Research Work:
- PINNs Research Paper: https://github.com/AyushkhatiDev/Physics-Informed-Neural-Networks-for-Solving-Partial-Differential-Equations/blob/main/PINN_PDE_Solver_Research_Paper.pdf
- Data Extraction Research Paper: https://github.com/AyushkhatiDev/data-extractor/blob/main/paper/DataExtractor%20copy.pdf
I am comfortable iterating based on feedback and delivering incremental improvements, which is essential for contributing effectively to open-source projects like Wikimedia.
Post-Internship Contribution
I plan to continue contributing to Wikimedia by:
- Maintaining and improving the Wikidata integration within Wikiscore, including fixing issues and optimizing performance
- Extending support for additional Wikidata-related features based on community needs
- Contributing to related Wikimedia tools and improving existing workflows where applicable
- Assisting new contributors by sharing knowledge, improving documentation, and supporting onboarding efforts
I aim to remain an active contributor by continuously improving the reliability and scalability of tools that support Wikimedia communities.
Thank you for your consideration.