Page MenuHomePhabricator

GSoC 2024 Proposal - Improve searchability and filtering of PageTriage feed
Closed, DeclinedPublic

Description

Profile Information

Name: Maunika Shekar
Email: maunika.shekar@gmail.com
GitHub: https://github.com/maunikashekar
Location: Tamil Nadu, India (IN).
Mentors: @Soda and @TheresNoTime
Typical working hours: 18:00 IST - 01:00 IST (UTC +5:30)

Synopsis

Improve searchability and filtering of PageTriage feed
PageTriage is an MediaWiki extension that allows patrollers on the English Wikipedia to track, categorize and deal with problematic new pages. One of its features is the VueJS based New pages feed which allows patrollers to filter specific interesting pages they might want to patrol based on certain criteria. However, these filters are often limited and there has been some interest amongst the community to introduce newer filters and in general improve the ability to search for specific content on the NewPagesFeed. The goal of this project is to enhance the filtering and searching capabilities of the NewPagesFeed. Particularly adding features like AI based topic prediction (leveraging the ORES API), the ability to search for a specific keyword in an article, filter by how many pageviews an article gets and be able to search by how similar a particular page is to other deleted pages.

Deliverables

  • Add ORES topic prediction to the NewPagesFeed and allow filtering by the same(T218132)

Enhancements to the frontend UI of PageTriage will include adding a tagging interface for users to easily understand the topics that article falls under. Backend integration will involve making API calls to ORES for topic predictions based on the context and body. A filtering mechanism will then be developed to display only articles relevant to the selected tags in the feed or toolbar.

  • Special:NewPageFeed - add option to filter by pageviews(T207238)

I plan to utilize pagetriage_page_tags table along with the page table to arrive at the pageviews count. Backend changes should be made to filter by a range of pageviews in Special:NewPagesFeed and should be served as an API action. Frontend should communicate with this API to fetch the filtered list based on the pageviews range.

  • Keyword Search for New Pages Feed( T207761)

I will introduce a new field titled "Search by keyword" in the search filters of the New Pages Feed, probably to fit in the "That" section. Backend integration will involve creating functionality to handle user-entered keywords and filter the article list accordingly. This can be achieved by forming an efficient database query to obtain the desired search results based on keyword. This might require modifications in the database table schema level for faster query processing.

  • See and filter with percent similarity to top deleted revision( T327955)

I will create an API within PageTriage to fetch the top deleted revision of an article. Additionally, I'll utilize functions like similar_text in PHP for determining the percentage of wikicode similarity between the current top revision and the retrieved deleted revision.

To facilitate manual triggering of the comparison, I'll incorporate a "Compare" button into the PageTriage interface. Moreover, I'll explore the possibility of automating the comparison process when users visit an article within PageTriage.

Project size - 350 hours

About Me

I am Maunika Shekar, working as an R&D Apprentice in Informatica Business Solutions Pvt. Ltd. I'm inclined towards full stack development and looking forward to getting started with my Open Source journey. I have closely collaborated with connector development team and implemented minor features of the Oracle Cloud Infrastructure (OCI) connector based on the business requirements using core Java. I have experience in setting up and maintaining Docker environments in business setup for large scale applications. I also have experience in building and maintaining web applications made with frameworks like React, VueJS, SpringBoot and Django.

Relevant skills

  • Vue JS
  • PHP
  • JavaScript
  • SQL
  • REST API
  • SPARQL
  • HTML
  • CSS
  • Java

Availability

1. Are you eligible for Google Summer of Code?
Yes. I’m eligible according to the terms described.

2. Do you plan to submit any other proposal apart from this one?
No. This is my only go.

3. Do you have any other plans during the period of GSoC?
No. I will be available for the entire term of GSoC.

4. How many hours per week can you dedicate to this?
I can dedicate 30 hours per week and even more if necessary.

5. Have you been accepted to GSoC before?
No. This is my first attempt.

Wikimedia Contribution

TitleStatusLink
Fix info chips problem in NewPagesFeed FiltersUnder reviewhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1010683
Fix "Add details" button functionality in curation toolbarUnder reviewhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1010966
Make "Show all" selected by default in filtersUnder reviewhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1009397
Fix multiple CSD selection problem based on codeUnder reviewhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1010972
Add close functionality for filters dialog on outside-clickUnder reviewhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1009405
Fix 'were created by' filter problem in NewPagesFeedAbandoned(Redundant)https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1010685

Microtask

WikiStats - A Vue.js based single page application to show some interesting statistics from PageTriage and Wikipedia Article Views along with a sample image grid from Wikimedia Commons.
Source code: https://github.com/maunikashekar/wikistats
Libraries: Wikimedia Codex, Axios

Timeline

TimeframeTask
Community bonding periodGetting in touch with the mentors
(May 02, 2024 - May 27, 2024)Interacting with the mentors and discuss on the project
Understand the codebase and architecture of the extension
Week 1Define the project requirements and features to be developed
(May 27, 2024 - June 3 , 2024)Develop the code strategy and decide on the potential deadlines for each feature development and testing
Week 2 & 3Set up the development environment
(June 04, 2024 - June 18, 2024)Add an option to filter by pageviews in Special:NewPagesFeed
Test the pageviews filter functionality
Write a biweekly report on the pageviews filter implementation and update documentation (if necessary)
Week 4 & 5Discuss on the most efficient database query strategy for keyword based search and request for database schema modification (if necessary)
(June 19, 2024 - July 3, 2024)Look into cache storage approach to minimize hits to the database
Implement keyword based search in the backend as an API action
Week 6 & 7Write unit tests for the keyword based search action
(July 4, 2024 - July 18, 2024)Write the frontend functionality to interact with the keyword based search through MediaWiki API action
Write jest tests for checking the keyword based search from the UI
Week 8 & 9Optimize the keyword search functionality
(July 19, 2024 - August 2, 2024)Test the keyword search functionality for multiple edge cases in New Pages Feed
Refactor the code for check-in
Write a report on the implemented keyword search functionality and update the documentation
Week 10Mid term Evaluation
(August 3, 2024 - August 10, 2024)Testing and bug fixing
Week 11 & 12Implement backend functionality to filter with percent similarity to top deleted revision
(August 11, 2024 - August 25, 2024)Write the unit tests for top deleted revision similarity filter
Test the filter functionality based on multiple revision delete edge cases
Week 13 & 14Write frontend functionality to interact with the top deleted revision similarity filter
(August 26, 2024 - September 10, 2024)Write jest test for the implemented filter
Week 15Meeting with mentors for suggestions on the proposed implementation of ORES topic prediction functionality
(September 11, 2024 - September 18, 2024)Refactoring the code and testing according to the suggestions
Week 16 & 17Implement ORES based topic prediction in the backend as an API action
(September 19, 2024 - October 3, 2024)Test the topic prediction functionality based on different contexts
Write the frontend code for interacting with the topic prediction API action
Week 18 (October 4, 2024 - October 11, 2024)Develop additional features if any desired deliverables are agreed upon
Week 19Final testing and documentation preparation for the implemented features
(October 12, 2024 - October 19, 2024)Final blog post publication

Event Timeline

@Maunikashekar I would suggest:

  • Expanding on how you plan on implement some of the deliverables mentioned in the original project task
  • Clearly specifying the project size.
  • Potentially spacing out the tasks if you are going for a large sized project, since you do get time until November to complete all the deliverables (for example you mention that you plan on implementing "keyword search for New Pages Feed" in a week, imo, that might be a bit optimistic, since such a task will require filing a database modification request which takes time to be actioned).
Soda closed this task as Declined.EditedMay 16 2024, 5:19 AM

Hi, this proposal unfortunately was not selected for this year's GSoC program. We had multiple amazing applications this year and we had to make the tough decision of choosing a single candidate. Not being selected for this particular year does not reflect on your abilities or your qualifications. Hope you don't feel discouraged, and we hope you will stay on and continue contributing to the Wikimedia movement in some capacity :)