PROFILE INFORMATION
Name : Sushrith Bogi
IRC nickname : sushrith
Email : bogisushrith@gmail.com
Web Profile : Phabricator,Github
Timezone : IST (Indian Standard Time) (UTC + 05.30)
Location (country or state) : Telangana, India
Typical working hours (include your timezone) : 18:00 IST to 01:00 IST (5.5 hours on weekdays), 10:00 IST to 22:00 IST (10 - 10.5 hours on Sundays)
SYNOPSIS
This project aims to significantly augment the filtering and searching functionalities of the New Pages Feed within PageTriage, an essential component of the MediaWiki extension. It serves as a critical tool for English Wikipedia patrollers, facilitating the management of newly created content. However, the current filtering options are deemed insufficient by the community, prompting the need for enhancements. This initiative empowers patrollers with improved tools for content management.
- Implement AI-based topic prediction leveraging the ORES API to enable patrollers to swiftly identify pages of thematic relevance.
- Introduce keyword-based search functionality, allowing patrollers to locate articles containing specific terms or phrases efficiently.
- Enhance filtering options based on the number of pageviews an article receives, providing insights into the popularity and relevance of newly created pages.
- Introduce a feature enabling patrollers to search for pages similar to previously deleted content, aiding in the identification and management of potentially problematic material.
Possible Mentor(s) : @Soda, @TheresNoTime
Have you contacted your mentors already?
Yes
Project Size : 350 hours
DELIVERABLES
T218132 Add ORES topic prediction to the NewPagesFeed and allow filtering by the same
ORES has introduced support for topic prediction, known as ArticleTopic. Distinguishing itself from class and potential issue predictions, topic prediction focuses on determining the subject matter of an article, whether it pertains to chemistry, politics and government, sports, Central Asia, and more.
Incorporate ArticleTopic into PageTriage, incorporating it into both the filters menu on Special:NewPagesFeed and the "Page info" flyout within the Page Curation toolbar.
This feature proves beneficial for new page patrollers who wish to streamline their feed by filtering for one or two topics aligned with their interests or expertise. Presently, alternative tools are utilized for such purposes.
Design:
Frontend: Select box with the capability for multiple selections will be implemented
Backend: Retrieves ArticleTopic data from the ORES API
Reference I utilized:
https://www.mediawiki.org/wiki/ORES/Articletopic
https://www.mediawiki.org/wiki/ORES#Topic_routing
T207238 Special:NewPageFeed - add option to filter by pageviews
Add functionality to sort the NewPageFeed by pageview count, so that Reviewers can prioritise high impact articles.
Design:
Approach : Present the pageviews count within the article record, omitting the provision for sorting or filtering. Potential pageview metrics to display include the average and median per day, as well as the total views over the past 30 days. It's important to note that the displayed results will reflect data from 24 hours prior to the current display time, and queries will be limited to a maximum of 30 days in the past for efficiency and manageability purposes. To gauge the general popularity of each article, we propose storing the ceiling of the logarithm base 10 of the pageviews as a page tag. This approach ensures a finite number of distinct values in the tag, providing reviewers with a general indication of each article's popularity.
Other approaches will also be discussed with the mentors (if any) and implement the finalized approach.
Example of pageviews:
https://nppbrowser.toolforge.org/popular-unreviewed.php
Reference I utilized:
https://phabricator.wikimedia.org/T207238
T207761 Keyword Search for New Pages Feed
Create a new field section in the search filters for the New Pages Feed
The text should read "Has the following keyword(s)"
If a user inputs one or multiple keywords into the field and clicks "Set Filter," the search results in the New Pages Feed should only display results that have matching keywords in the article text
The functionality should resemble the following:
Reference I utilized:
https://nppbrowser.toolforge.org/index.php?mode=NPP
T327955 See and filter with percent similarity to top deleted revision
CSD G4 requires that the new article be substantially similar to the old article. However patrollers that aren't admins cannot see deleted revisions.
PageTriage already detects if an article has been "previously deleted". This ticket is to explore the idea of expanding this detection to include...
Detection of a previous AFD, by checking for the existence of an AFD page
If previous AFD detected, and the page has been deleted before, there should be an API added to PageTriage to pull the top deleted revision, and then compare it to the current top revision, and provide a % wikicode match.
This should either be run with a button, or run automatically.
May or may not want to make this a pagetriage_page_tag (article metadata).
This can be achieved by following a process:
API - add pagetriagelist API (but then would calculate this all the time, and calculating it could be expensive), or could create a dedicated API (would only calculate this when needed)
Front end - could add a button to calculate this, or could auto calculate it when visiting the article, or could auto calculate it for everything and add it as a red article tag in Special:NewPagesFeed (pagetriage_page_tag / metadata are involved in the latter)
Add similar support for "previous AFD", there is an afd_status page tag, but it only tracks current deletion tagging. There is a recreated tag, but it tracks all kinds of previous deletion, not just AFD.
TIMELINE
Period | Task |
---|---|
May 1 - May 26 | Community bonding period : Acquaint myself with the ORES web services and API for AI-based topic prediction, I plan to gain more familiarity with the existing codebase of the PageTriage extension, analyzing its architecture and implementation details to better comprehend potential challenges and opportunities for enhancement. I would explore the sorting mechanisms based on the pageview count. I intend to analyze the keyword search techniques for articles based on themes or content. I plan to gain a comprehensive understanding of how to add or create an API that could calculate the visitors of the article.Will have college exams for a week. |
May 27 - June 2 | Discuss with mentors to finalize what topics must be included in topic prediction. Thoroughly study the ORES documentation to understand how its API for topic prediction works, aiming to integrate it into Vue.js with PHP. |
June 3 - June 9 & June 10 - June 16 | Execute the backend integration of ORES to facilitate topic prediction. Write a bi-weekly report |
June 17 - June 23 | Finalize the UI design with mentors. Implement the design to Select box with the capability for multiple selections for topics. Write a bi-weekly report. |
June 24 - June 30 | Implement the initial testing, fix bugs if any and update the documentation. |
July 1 - July 7 | Finalize the page view count approach. Study the databases. Write a bi-weekly report. |
July 8 - July 14 | Figure out the efficient SQL queries in pagetriage_page_tags table. Prepare for mid term evaluations. |
July 15 - July 21 & July 22 - July 28 | Implement the script for invoking the API responsible for storing page views. Write a bi-weekly report. |
July 29 - August 4 | Integrate backend page views logic for retrieving and displaying view counts with frontend UI. Write a bi-weekly report. |
August 5 - August 11 | Implement the initial testing, fix bugs if any and update the documentation. |
August 12 - August 18 | Research the documentation about the keyword search like Npp browser. Write a bi-weekly report. |
August 19 - August 25 & August 26 - September 1 | Write backend script for Integrating keyword search functionality into the backend. Write a bi-weekly report. |
September 2 - September 8 | Integrate the backend and frontend components of the keyword search feature. |
September 9 - September 15 | Implement the initial testing, fix bugs if any and update the documentation. Write a bi-weekly report. |
September 16 - September 22 | Research and understand the PageTriage already detects if an article has been "previously deleted" |
September 23 - September 29 & September 30 - October 6 | Develop an API intended for integration into PageTriage, tasked with retrieving the top deleted revision and subsequently implementing requisite modifications to compare it against the current top revision, thereby providing a percentage wikicode match. |
October 7 - October 13 & October 14 - October 20 | Integrate frontend and backend, perform testing to validate the functionality and performance of the API, ensuring its proper operation. Compose documentation detailing the implemented modifications |
October 21 - November 3 | Allocate this buffer time to accommodate any unforeseen issues or delays that may occur throughout the project timeline. Compose the final documentation and final report for the project |
PARTICIPATION
I maintain an active presence across various communication platforms, including Zulip, Slack and Email. For discussing issues and conducting code reviews, I will utilize Phabricator and Gerrit.With no other commitments, I am capable of dedicating more than 40 hours per week to this project.
ABOUT ME
- Your education (completed or in progress)
I am a third year student currently pursuing my Engineering in department of Information Technology
- How did you hear about this program?
I heard about Google Summer of Code in my freshman year from my seniors, they discussed about contributing to an open-source organization and benefit of it. The very idea of working collaboratively on a real-world project with people from diverse background and skill sets and opportunity to learn and grow as a developer fascinated me.
- Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?
I will have some exams during the May 14 - May 21 week which may lead to me being engaged during that period. I'll be in touch with my mentors and inform them of my progress.
- Are you planning to apply to both programs and, if so, with what organization(s)?
I am applying only for Google Summer of Code.
- What does making this project happen mean to you?
Actualizing this project holds profound significance for me as it represents a commitment to advancing the accessibility and integrity of information on Wikipedia. By enhancing the filtering and search capabilities of the New pages feed, we empower volunteers to effectively manage and curate content, ultimately enriching the user experience and furthering the mission of free knowledge sharing.
I am very enthusiastic about this project, and my involvement in Wikimedia since December 2022 has provided an exceptionally enriching learning journey. I aim to continue this journey even after the project and contribute to Wikimedia.
PAST EXPERIENCE
Microtask
- Create a small independent tool/web app that interacts with any Wikimedia API and displays some information about an article. The tool must have a frontend built using VueJS and the Wikimedia Codex UI library. Include a link to the source code in your proposal
Source code for the proposal: Code
Technologies used
- Vue.js
- Wikimedia Codex UI library
- Wikimedia API
- Setup the PageTriage extension (using these draft instructions) along with MediaWiki-Docker.
PageTriage setup has successfully completed
I have been a part of the Wikimedia community since November 2022.Given Below are some of my contributions to Wikimedia Projects :
Image Search (Github link)
- Developed by using Vue.js.
- Generates an image that corresponds to the user provided input.
- Utilizes Unsplash API to fetch the image.