Page MenuHomePhabricator

GSoC 2024 Proposal : Improve searchability and filtering of PageTriage feed
Closed, DeclinedPublic

Assigned To
Authored By
Sushrith_Bogi
Mar 24 2024, 6:30 PM
Referenced Files
F43702469: s1.png
Mar 28 2024, 10:39 PM
F43692354: s3.png
Mar 28 2024, 10:34 PM
F43685694: s2.png
Mar 28 2024, 10:34 PM
Restricted File
Mar 28 2024, 10:34 PM
F43241792: main.png
Mar 24 2024, 6:32 PM
Restricted File
Mar 24 2024, 6:30 PM

Description

PROFILE INFORMATION

Name : Sushrith Bogi

IRC nickname : sushrith

Email : bogisushrith@gmail.com

Web Profile : Phabricator,Github

Timezone : IST (Indian Standard Time) (UTC + 05.30)

Location (country or state) : Telangana, India

Typical working hours (include your timezone) : 18:00 IST to 01:00 IST (5.5 hours on weekdays), 10:00 IST to 22:00 IST (10 - 10.5 hours on Sundays)

SYNOPSIS

This project aims to significantly augment the filtering and searching functionalities of the New Pages Feed within PageTriage, an essential component of the MediaWiki extension. It serves as a critical tool for English Wikipedia patrollers, facilitating the management of newly created content. However, the current filtering options are deemed insufficient by the community, prompting the need for enhancements. This initiative empowers patrollers with improved tools for content management.

  • Implement AI-based topic prediction leveraging the ORES API to enable patrollers to swiftly identify pages of thematic relevance.
  • Introduce keyword-based search functionality, allowing patrollers to locate articles containing specific terms or phrases efficiently.
  • Enhance filtering options based on the number of pageviews an article receives, providing insights into the popularity and relevance of newly created pages.
  • Introduce a feature enabling patrollers to search for pages similar to previously deleted content, aiding in the identification and management of potentially problematic material.

Possible Mentor(s) : @Soda, @TheresNoTime

Have you contacted your mentors already?
Yes
Project Size : 350 hours

DELIVERABLES

T218132 Add ORES topic prediction to the NewPagesFeed and allow filtering by the same

ORES has introduced support for topic prediction, known as ArticleTopic. Distinguishing itself from class and potential issue predictions, topic prediction focuses on determining the subject matter of an article, whether it pertains to chemistry, politics and government, sports, Central Asia, and more.
Incorporate ArticleTopic into PageTriage, incorporating it into both the filters menu on Special:NewPagesFeed and the "Page info" flyout within the Page Curation toolbar.
This feature proves beneficial for new page patrollers who wish to streamline their feed by filtering for one or two topics aligned with their interests or expertise. Presently, alternative tools are utilized for such purposes.
Design:

s1.png (463×900 px, 60 KB)

Frontend: Select box with the capability for multiple selections will be implemented
Backend: Retrieves ArticleTopic data from the ORES API
Reference I utilized:
https://www.mediawiki.org/wiki/ORES/Articletopic
https://www.mediawiki.org/wiki/ORES#Topic_routing

T207238 Special:NewPageFeed - add option to filter by pageviews
Add functionality to sort the NewPageFeed by pageview count, so that Reviewers can prioritise high impact articles.
Design:

s2.png (216×2 px, 94 KB)

Approach : Present the pageviews count within the article record, omitting the provision for sorting or filtering. Potential pageview metrics to display include the average and median per day, as well as the total views over the past 30 days. It's important to note that the displayed results will reflect data from 24 hours prior to the current display time, and queries will be limited to a maximum of 30 days in the past for efficiency and manageability purposes. To gauge the general popularity of each article, we propose storing the ceiling of the logarithm base 10 of the pageviews as a page tag. This approach ensures a finite number of distinct values in the tag, providing reviewers with a general indication of each article's popularity.
Other approaches will also be discussed with the mentors (if any) and implement the finalized approach.
Example of pageviews:
https://nppbrowser.toolforge.org/popular-unreviewed.php
Reference I utilized:
https://phabricator.wikimedia.org/T207238

T207761 Keyword Search for New Pages Feed
Create a new field section in the search filters for the New Pages Feed
The text should read "Has the following keyword(s)"
If a user inputs one or multiple keywords into the field and clicks "Set Filter," the search results in the New Pages Feed should only display results that have matching keywords in the article text
The functionality should resemble the following:

s3.png (867×1 px, 148 KB)

Reference I utilized:
https://nppbrowser.toolforge.org/index.php?mode=NPP

T327955 See and filter with percent similarity to top deleted revision
CSD G4 requires that the new article be substantially similar to the old article. However patrollers that aren't admins cannot see deleted revisions.
PageTriage already detects if an article has been "previously deleted". This ticket is to explore the idea of expanding this detection to include...
Detection of a previous AFD, by checking for the existence of an AFD page
If previous AFD detected, and the page has been deleted before, there should be an API added to PageTriage to pull the top deleted revision, and then compare it to the current top revision, and provide a % wikicode match.
This should either be run with a button, or run automatically.
May or may not want to make this a pagetriage_page_tag (article metadata).
This can be achieved by following a process:
API - add pagetriagelist API (but then would calculate this all the time, and calculating it could be expensive), or could create a dedicated API (would only calculate this when needed)
Front end - could add a button to calculate this, or could auto calculate it when visiting the article, or could auto calculate it for everything and add it as a red article tag in Special:NewPagesFeed (pagetriage_page_tag / metadata are involved in the latter)
Add similar support for "previous AFD", there is an afd_status page tag, but it only tracks current deletion tagging. There is a recreated tag, but it tracks all kinds of previous deletion, not just AFD.

TIMELINE

PeriodTask
May 1 - May 26Community bonding period : Acquaint myself with the ORES web services and API for AI-based topic prediction, I plan to gain more familiarity with the existing codebase of the PageTriage extension, analyzing its architecture and implementation details to better comprehend potential challenges and opportunities for enhancement. I would explore the sorting mechanisms based on the pageview count. I intend to analyze the keyword search techniques for articles based on themes or content. I plan to gain a comprehensive understanding of how to add or create an API that could calculate the visitors of the article.Will have college exams for a week.
May 27 - June 2Discuss with mentors to finalize what topics must be included in topic prediction. Thoroughly study the ORES documentation to understand how its API for topic prediction works, aiming to integrate it into Vue.js with PHP.
June 3 - June 9 & June 10 - June 16Execute the backend integration of ORES to facilitate topic prediction. Write a bi-weekly report
June 17 - June 23Finalize the UI design with mentors. Implement the design to Select box with the capability for multiple selections for topics. Write a bi-weekly report.
June 24 - June 30Implement the initial testing, fix bugs if any and update the documentation.
July 1 - July 7Finalize the page view count approach. Study the databases. Write a bi-weekly report.
July 8 - July 14Figure out the efficient SQL queries in pagetriage_page_tags table. Prepare for mid term evaluations.
July 15 - July 21 & July 22 - July 28Implement the script for invoking the API responsible for storing page views. Write a bi-weekly report.
July 29 - August 4Integrate backend page views logic for retrieving and displaying view counts with frontend UI. Write a bi-weekly report.
August 5 - August 11Implement the initial testing, fix bugs if any and update the documentation.
August 12 - August 18Research the documentation about the keyword search like Npp browser. Write a bi-weekly report.
August 19 - August 25 & August 26 - September 1Write backend script for Integrating keyword search functionality into the backend. Write a bi-weekly report.
September 2 - September 8Integrate the backend and frontend components of the keyword search feature.
September 9 - September 15Implement the initial testing, fix bugs if any and update the documentation. Write a bi-weekly report.
September 16 - September 22Research and understand the PageTriage already detects if an article has been "previously deleted"
September 23 - September 29 & September 30 - October 6Develop an API intended for integration into PageTriage, tasked with retrieving the top deleted revision and subsequently implementing requisite modifications to compare it against the current top revision, thereby providing a percentage wikicode match.
October 7 - October 13 & October 14 - October 20Integrate frontend and backend, perform testing to validate the functionality and performance of the API, ensuring its proper operation. Compose documentation detailing the implemented modifications
October 21 - November 3Allocate this buffer time to accommodate any unforeseen issues or delays that may occur throughout the project timeline. Compose the final documentation and final report for the project

PARTICIPATION

I maintain an active presence across various communication platforms, including Zulip, Slack and Email. For discussing issues and conducting code reviews, I will utilize Phabricator and Gerrit.With no other commitments, I am capable of dedicating more than 40 hours per week to this project.

ABOUT ME

  • Your education (completed or in progress)

I am a third year student currently pursuing my Engineering in department of Information Technology

  • How did you hear about this program?

I heard about Google Summer of Code in my freshman year from my seniors, they discussed about contributing to an open-source organization and benefit of it. The very idea of working collaboratively on a real-world project with people from diverse background and skill sets and opportunity to learn and grow as a developer fascinated me.

  • Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?

I will have some exams during the May 14 - May 21 week which may lead to me being engaged during that period. I'll be in touch with my mentors and inform them of my progress.

  • Are you planning to apply to both programs and, if so, with what organization(s)?

I am applying only for Google Summer of Code.

  • What does making this project happen mean to you?

Actualizing this project holds profound significance for me as it represents a commitment to advancing the accessibility and integrity of information on Wikipedia. By enhancing the filtering and search capabilities of the New pages feed, we empower volunteers to effectively manage and curate content, ultimately enriching the user experience and furthering the mission of free knowledge sharing.
I am very enthusiastic about this project, and my involvement in Wikimedia since December 2022 has provided an exceptionally enriching learning journey. I aim to continue this journey even after the project and contribute to Wikimedia.

PAST EXPERIENCE

Microtask

  • Create a small independent tool/web app that interacts with any Wikimedia API and displays some information about an article. The tool must have a frontend built using VueJS and the Wikimedia Codex UI library. Include a link to the source code in your proposal

Source code for the proposal: Code
Technologies used

  • Vue.js
  • Wikimedia Codex UI library
  • Wikimedia API

PageTriage setup has successfully completed

main.png (1×1 px, 122 KB)

I have been a part of the Wikimedia community since November 2022.Given Below are some of my contributions to Wikimedia Projects :

PatchDESCRIPTIONSTATUS
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1012741Fix weird count when switching between feedsMerged
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/879899Animation should take left pane when "preview" is clickedMerged
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/879892Fix to open link in same tabMerged
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/887954Testers on Implementations should be linksMerged
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PushAll/+/888355Exchange the word for a different oneMerged
https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/890360Visual change in the article toolbarMerged
https://gerrit.wikimedia.org/r/c/wikimedia/developer-portal/+/927717Replace the linkMerged
https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/912401Eliminate unnecessary spacing in the username/create account linkMerged
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1011156Fix copyvio prop type validation errorIn-Review
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1010248Maintain consistency across NPP and AFCIn-Review
Image Search (Github link)
  • Developed by using Vue.js.
  • Generates an image that corresponds to the user provided input.
  • Utilizes Unsplash API to fetch the image.

Event Timeline

Sushrith_Bogi renamed this task from GSoC 2024 Proposal : Improve searchability and filtering of PageTriage feed (Work in Progress) to GSoC 2024 Proposal : Improve searchability and filtering of PageTriage feed .Mar 31 2024, 1:46 PM
Soda closed this task as Declined.EditedMay 16 2024, 5:18 AM

Hi, this proposal unfortunately was not selected for this year's GSoC program. We had multiple amazing applications this year and we had to make the tough decision of choosing a single candidate. Not being selected for this particular year does not reflect on your abilities or your qualifications. Hope you don't feel discouraged, and we hope you will stay on and continue contributing to the Wikimedia movement in some capacity :)