Page MenuHomePhabricator

GSOC-2024 - Improve searchability and filtering of PageTriage feed proposal
Open, Needs TriagePublic

Assigned To
Authored By
Rockingpenny4
Mar 24 2024, 12:00 AM
Referenced Files
F44157294: Screenshot from 2024-04-02 17-12-51.png
Apr 2 2024, 11:49 AM
F43957828: image.png
Mar 31 2024, 7:47 PM
F43801835: carbon.png
Mar 29 2024, 10:44 PM
F43169922: Screenshot from 2024-03-24 01-45-41.png
Mar 24 2024, 12:02 AM
F43167664: FilterRadios.vue (1).png
Mar 24 2024, 12:00 AM
F43167615: Screenshot from 2024-03-24 01-33-37.png
Mar 24 2024, 12:00 AM
F43167731: Screenshot from 2024-03-24 03-32-26.png
Mar 24 2024, 12:00 AM
F43168031: Screenshot from 2024-03-24 04-26-45.png
Mar 24 2024, 12:00 AM
Tokens
"Like" token, awarded by Maryann-Onyinye.

Description

Proposal for: https://phabricator.wikimedia.org/T357337
Phabricator Proposal : https://phabricator.wikimedia.org/T360848

Profile Information

Name: Angel Sharma
Github: fillingtothemomo
Gmail: rockingpenny4@gmail.com
Phabricator: Rockingpenny4
Gerrit: rockingpenny4
Location: Mathura, India
Time Zone: IST(UTC+5.30)
Working hours: 3:00 PM to 3:00 AM (IST)

Synopsis

PageTriage is an MediaWiki extension that allows patrollers on the English Wikipedia to track, categorize and deal with problematic new pages. One of it's features is the VueJS based New pages feed which allows patrollers to filter specific interesting pages they might want to patrol based on certain criteria. However, these filters are often limited and there has been some interest amongst the community to introduce newer filters and in general improve the ability to search for specific content on the New pages feed.

As part of this project, the filtering and searching capabilities of the New pages feed should be enhanced. Particularly, to add AI based topic prediction (leveraging the ORES API), the ability to search for a specific keyword in a article, filter by how many pageviews a article gets and be able to search by how similar a particular page is to other deleted pages.

Possible Mentor(s)
@Soda , @TheresNoTime

Project Size: 350 hours

Have you contacted your mentors already?
Yes

Deliverables

T218132 Add ORES topic prediction to the NewPagesFeed and allow filtering by the same:

ORES now supports topic prediction (articletopic). Topic prediction is different than class prediction and potential issue prediction. Topic prediction means predicting if an article is about chemistry, politics and government, sports, Central Asia, etc.
New page patrollers would find this useful if they want to filter the feed by one or two topics that they are interested in or have specialized knowledge in. Currently other tools are used to do this, such as
https://en.wikipedia.org/wiki/User:SDZeroBot/NPP_sorting, but is not integrated into PageTriage.
Integrate articletopic into PageTriage, in both the Special:NewPagesFeed filters menu, and the Page Curation toolbar "Page info" flyout using PHP.
Article topic: (some illustrations)

Screenshot from 2024-03-24 04-26-45.png (217×1 px, 98 KB)

Approach:
Understanding ORES Articletopic: I need to become familiar with the ORES Articletopic model and how it predicts topics in Wikipedia articles.
Reviewing Existing Tools: I'll examine tools like User:SDZeroBot/NPP_sorting to understand how ORES is currently used for topic prediction and filtering.
Defining Integration Points: In PageTriage we want to integrate articletopic predictions in the Special:NewPagesFeed filters menu and the Page Curation toolbar's "Page info" flyout.
Accessing ORES API: I'll ensure I have access to the ORES API to fetch articletopic predictions for articles.
Updating Frontend UI: I'll enhance the frontend UI of PageTriage , this could be integrated as a select box by enabling selecting multiple topics as the categories are pre-defined and will be more helpful for users.
Backend Integration: I'll integrate articletopic predictions into PageTriage's backend by making API calls to ORES and storing predictions in the database for filtering.
Implementing Filtering Mechanism: I'll develop logic to filter articles based on predicted topics, ensuring that only relevant articles are displayed in the feed or toolbar.
Testing and Gathering Feedback: I'll conduct thorough testing to ensure the integration works as expected and gather feedback from users, particularly new page patrollers, for improvements.
Mock UI
image.png (394×509 px, 13 KB)

Links I referred to:
https://phabricator.wikimedia.org/T245906
https://www.mediawiki.org/wiki/ORES#Topic_routing
https://www.mediawiki.org/wiki/ORES/Articletopic
https://phabricator.wikimedia.org/T240517
https://www.mediawiki.org/wiki/ORES#API_usage


T207761 Keyword Search for New Pages Feed:

• Create a new field in the "That" section of the search filters for the New Pages Feed, which is the last option (i.e. bottom of list)
• The text should read "Has the following keyword(s)"
• If a user inputs one or multiple keywords into the field and clicks "Set Filter," the search results in the New Pages Feed should only display results that have matching keywords in the article text.
Approach:
Keyword Search Field: Implement a new field labelled "Has the following keyword(s)" at the bottom of the "That" section in the search filters of the New Pages Feed.
Backend Integration: Develop backend functionality to process user-inputted keywords and filter the list of articles accordingly.
Frontend UI Enhancement: Update the frontend UI of the New Pages Feed to include the keyword search field and ensure a seamless user experience.
Testing and Validation: Thoroughly test the implemented feature to ensure its functionality and usability.
Implementing the feature like present here https://tools.wmflabs.org/nppbrowser/ .
Keyword search example:

Screenshot from 2024-03-24 03-32-26.png (436×904 px, 67 KB)

Mock UI
Screenshot from 2024-03-24 01-45-41.png (471×642 px, 82 KB)

Integrate into Search filters by adding to FilterRadios.vue:
FilterRadios.vue (1).png (894×1 px, 467 KB)


T207238 Special:NewPageFeed - add option to filter by pageviews:

Add functionality to sort the NewPageFeed by pageview count, so that Reviewers can prioritise high impact articles.
Proposed Approach:
Display Pageview Counts:
Display pageview counts for articles, including metrics such as average daily views or total views in the last 30 days.
Utilize a logarithmic scale for better visualization, ensuring manageable distinct values.

Sorting and Filtering Capabilities:
Implement sorting capabilities based on pageview counts to allow reviewers to prioritize high-impact articles.
Provide options to sort articles by average daily views or total views in the last 30 days.

Efficient Data Querying:
Query pageview data efficiently from a maximum of 30 days ago, considering a 24-hour lag in display time.
Implement a maintenance script to periodically fetch and store pageview data in the PageTriage table.

Optimized SQL Queries:
Ensure optimized SQL queries for efficient data retrieval, especially when ordering pages by tag value.
Example query whether PageTriage schema can efficiently order pages by tag value (sorting by category count):

carbon.png (596×1 px, 127 KB)

Refinements and discussions during the GSOC period will help fine-tune implementation and address any challenges.
Example of page views: https://nppbrowser.toolforge.org/popular-unreviewed.php
Links used for reference: https://phabricator.wikimedia.org/T225169 , https://phabricator.wikimedia.org/T230567 .


T327955 See and filter with percent similarity to top deleted revision:

CSD G4 requires that the new article be substantially similar to the old article. However patrollers that aren't admins cannot see deleted revisions.
PageTriage already detects if an article has been "previously deleted". Explore the idea of expanding this detection to include...
Detection of a previous AFD, by checking for the existence of an AFD page
If previous AFD detected, and the page has been deleted before, there should be an API added to PageTriage to pull the top deleted revision, and then compare it to the current top revision, and provide a % wikicode match.
This should either be run with a button, or run automatically.
May or may not want to make this a pagetriage_page_tag (article metadata).
Approach:
API Development-
Develop an API in PageTriage to retrieve the top deleted revision of an article.
Implement a comparison algorithm to calculate the percentage of wikicode match between the current top revision and the retrieved deleted revision.

Add a button in the PageTriage interface for users to manually trigger the comparison.
Explore automated comparison when visiting an article in PageTriage.
Optionally, display the percentage wikicode match as a red article tag in Special:NewPagesFeed.

AFD Detection-
Investigate AFD detection feasibility by checking for the existence of an AFD page.
Integrate AFD detection into the PageTriage interface for additional context to patrollers.

Documentation and Testing-
Create comprehensive documentation for the new features.
Perform thorough testing to ensure the accuracy and reliability of the comparison algorithm and AFD detection.


Timeline

Pre-GSOC
Work on open issues on wikimedia phabricator and improve my skills and understanding of the mediawiki codebase , whilst still exploring the project and gathering more information about features to be implemented during GSOC project. I have already contributed to various extensions like InlineComments, PageTriage, MobileFrontend, AdminLinks and WikiEduDashboard and learnt a lot from each PR made.

Community Bonding Period

May 02, 2024 - May 27, 2024
  • Get acquainted with mentors and the Wikimedia community.
  • Familiarize myself with the existing codebase and architecture of PageTriage extension also discuss potential ideas and approaches for solving the identified issues.
  • Dive deeper into understanding the ORES service and its integration possibilities with PHP and Vue.js.
  • Engage in discussions with mentors and community members to refine project goals.

Coding Period

May 27, 2024 - June 10, 2024
  • Look into the initial features to be implemented and start work on integrating ORES for searching article topics into the Page Triage filters.
  • Research ORES documentation and understand its API for topic prediction to be integrated into Vue JS using PHP. Write the bi-weeekly report.
June 10, 2024 - June 24, 2024
  • Implement the backend integration of ORES for topic prediction.
  • Begin frontend development for displaying topic filters in the New Pages Feed and Page Curation toolbar. Write bi-weekly report.
June 24, 2024 - July 08, 2024
  • Finalize frontend implementation and ensure proper interaction with the backend ORES service.
  • Conduct initial testing and resolve any issues encountered whilst simultaneously updating the documentation.
  • Start researching on implementation of keyword search like nppBrowser and going through the documentation.
  • Prepare for mid-evaluation and resolve bugs , if any. Write bi-weekly report.

Mid-Evaluation

July 08, 2024 - July 22, 2024
  • Work on feedback received from the evaluation and research for approaches for implementing a page views count , finalise an approach with the help of mentors and work on it.
  • Start integration of keyword search on the backend. Write bi-weekly report
July 22, 2024 - August 12, 2024
  • Work on frontend UI implementation
  • Integrate backend and frontend of keyword search feature and resolve bugs while testing , if any.
  • Timely update the documentation and write bi-weekly report.
August 12, 2024 - August 26, 2024
  • Start working on implementation of page view counts using the proposed approach.
  • Understanding database and efficient SQL queries in pagetriage_page_tags table
  • Implementing the maintenance script for calling API that stores page views. Write bi-weekly report
August 26, 2024 - September 9, 2024
  • Integrate page views backend logic for retrieving and displaying view counts with frontend UI and sort it.
  • Finalise the code on discussing with mentors for seamless integration. Write bi-weekly report.
September 9 , 2024 - September 23, 2024
  • Start working on See and filter with percent similarity to top deleted revision feature
  • Research methods for comparing revisions and detecting % similarity to top deleted revisions. Write bi-weekly report.
September 23, 2024 - October 7, 2024
  • Develop the backend API for pulling and comparing deleted revisions.
  • Implement frontend components for displaying % similarity and AFD detection status in the New Pages Feed. Write bi-weekly report
October 7, 2024 - October 21, 2024
  • Conduct testing to ensure accuracy and reliability of similarity comparison.
  • Finalize implementation, including any necessary optimizations or adjustments based on testing results.
  • Prepare documentation for the new features and ensure code quality meets project standards. Write bi-weekly report
October 21, 2024 - November 4, 2024 (Buffer-period)
  • Use this period to catch up on any backlog or address any unforeseen challenges encountered during the coding phase.
  • Address any pending issues, bugs, or feature requests identified during testing and ensure all features are working as expected.
  • Finalise documentation and prepare for final evaluation by organizing code repositories, submitting final reports, and collecting feedback from mentors.
  • Write final blog report.

Final-Evaluation

Post-GSOC
I am learning a lot by contributing to Wikimedia. Even after the GSoC period ends, I plan on contributing to this organization by adding to my past contributions and working on open issues because of the familiarity of the technical stack and the new challenges that I am continually offered in the process.
Also, I would like to complete the future goals that come up during the GSOC period. Having picked up many development skills, my primary focus would be to help the project and the community grow. I would also be interested in helping other people in getting started with their open-source journey and guide them in this fun process.

Participation
I am active on Email, Zulip, Discord and Slack. I will use Phabricator and Gerrit for issue discussions and code reviews. I plan on regularly meeting with my mentor to discuss my progress and get feedback on my work. I can dedicate 45+ hours a week as I have no other commitments.

About Me

Education
College: Indian Institute of Technology(IIT), Roorkee
Year of Study: 2nd year
Field of Study: Mathematics and Computing(Bachelors of Science)

Skills
I am a member of IMG- Information and Management Group of my college and we are responsible for handling the entire college's data , the Institute official website and Channeli - a one stop application for student and faculty's entire information ranging from placement stats and noticeboard to lost and found and complaints and grievances ; and various other projects. Hence , I have a lot of experience working on production level apps used by thousands of people and working with an amazing and collaborative team that makes a huge impact.

  • Javascript ,HTML,CSS, Tailwind : Used vanilla JS in making projects like comic-book displaying website and basic games and CSS for styling.
  • Django, PHP: Making backend for various applications
  • React JS , Vue JS: Used for frontend development in full stack projects
  • Flutter, Java : Used for app development in Android Studio
  • Docker

How did you hear about this program?
After getting into college, I learned about Google Summer of Code from my seniors, and some of them were selected for it and after talking with them, I looked at the program with greater interest.

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the program?
No, my current semester ends in May first week, and I will have holidays for 2.5 months where I can commit all my focus to this project and commit to 50+ hours a week, and I have no other commitments. After my college starts I can commit to 40+ hours a week as needed.

We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
I am 100% loyal to Wikimedia Foundation and only plan on applying to Google Summer of Code with the Wikimedia Foundation.

What does making this project happen mean to you?
I have always been excited by the prospect of converting ideas into products with real-world impact and that is exactly what the Wikimedia Foundation does, producing free and open-source applications that impart learning to millions of people over the globe.
I am highly interested in this project, and contributing to Wikimedia since December 2023 has been a really fantastic learning experience with assistance from all mentors; each PR teaches me something new, and each feedback and code review enhances my coding skills and understanding of the project. Getting to work on this project will teach me production-level code structures and massively impact my learning.

Past Experience

Microtask- Create a small independent tool/web app that interacts with any Wikimedia API and displays some information about a article. The tool must have a frontend built using VueJS and the Wikimedia Codex UI library. Include a link to the source code in your proposal
Wiki_Word leverages mediawiki's opensearch API and Codex UI library for user to search a specific word in an article of language of their choice with a cool dark mode.
Deployed here
Microtask- Completed setup of PageTriage using Mediawiki Docker on Ubuntu 22.04

Screenshot from 2024-04-02 17-12-51.png (1×1 px, 152 KB)

Contributions to Wikimedia

TitleLinkStatus
Add timestamp display to comment replieshttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/InlineComments/+/1010852Merged
App timestamp display on comment creationhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/InlineComments/+/1010349Merged
ALRow: Add row search classhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/AdminLinks/+/1007973Merged
Fixes expand sections visibility on browser resizehttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/MobileFrontend/+/1011221Merged
Fixes DateControlSection component cut-offhttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1011034Merged
Fixes date a11y issueshttps://github.com/WikiEducationFoundation/WikiEduDashboard/pull/5687Merged
Fixes inconsistent highlight issue in navbarhttps://github.com/WikiEducationFoundation/WikiEduDashboard/pull/5661Merged
Fixes toolbar falling off screen on zoominghttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1013680In Review
Restores the page to default page on resolving all commentshttps://gerrit.wikimedia.org/r/c/mediawiki/extensions/InlineComments/+/1014524In Review
refactors milestones to functional componenthttps://github.com/WikiEducationFoundation/WikiEduDashboard/pull/5601In Review
renders dates for milestones in home tabhttps://github.com/WikiEducationFoundation/WikiEduDashboard/pull/5581In Review

Past Projects
-DirecM
Worked on a project on app development using Flutter and Arduino by using infrared sensors for a wayfinding app for blind and visually impaired people under an event organized by a technical club of our college.

-ProTrack
My first major React project. Made a full stack application for managing personal groups and projects using React JS and Django backend with MySQL and Tailwind CSS for styling.

-DRDO sensor Malware[Stealth Project]
Worked on developing malware apps for DRDO ( Defence Research and Development Organisation ), India, in a research internship under Dr. Sateesh K. Peddoju involving native android development.

Other open-source contributions
-Omniport-Docker
Official docker distribution of Omniport - one true portal for every educational institute.
-Circuitverse
CircuitVerse is a free, open-source platform that allows users to construct digital logic circuits online.

Event Timeline

Rockingpenny4 renamed this task from GSOC-2024 - Improve searchability and filtering of PageTriage feed proposal [WIP] to GSOC-2024 - Improve searchability and filtering of PageTriage feed proposal .Mar 29 2024, 5:48 AM
Rockingpenny4 updated the task description. (Show Details)

Weekly Internship Report ( 27 may- 2nd june)
1. Overview of Tasks Completed:
Task 1: Community Bonding
Task 2: Made a wiki user page https://en.wikipedia.org/wiki/User:Rockingpenny4
Task 3: Worked on adding articletopic model prediction to ORES and PageTriage T218132
2. Key Accomplishments:
Made these patches as a part of my progress on the phab ticket
Accomplishment 1: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/1035044
Accomplishment 2: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1032563
3. Challenges Faced:
Challenge 1: Understanding the codebase were a bit challenging at first but my mentors helped me through the process
Challenge 2: Selenium tests weren't running on my ubuntu due to some issue with browser but updating my kernel version fixed it.
4. Learnings and Skills Gained:
Learning 1: I played around with the codebase and db queries which was quite fun and had to setup my mediawiki docker again ;) , I learned more about the ORES service and how models are integrated.
Skill 1: Gained knowledge about PHP hooks and job queues and how data flows about in an extension.
6. Goals for Next Week:
Goal 1: Work on refining my patches on gerrit and ensuring code quality.
Goal 2: Write required tests and work on adding a filter for articletopic on frontend.

Weekly Internship Report ( 2nd june-9th june)
1. Overview of Tasks Completed:
Task 1: Added tests for integrating articletopic model with ores https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/1035044
Task 2: Added predicted topic field to NPP articles https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/103256
2. Goals for Next Week:
Goal 1: Work on filtering by topics for frontend ui .
Goal 2: Complete the tests for all related patches for T218132 made so far.

Weekly Internship Report ( 16th june-23rd june)
1. Overview of Tasks Completed:
Task 1: added articletopic filter menu on npp frontend (WIP) - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1042325
Task 2: Configured the predicted topic field to show multiple topics -https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1032563
2. Goals for Next Week:
Goal 1: Refactor the API calls for articletopic filtering for page triage
Goal 2: Add chips for making filtering user friendly

Weekly Internship Report ( 23rd June- 7th July)
1. Overview of Tasks Completed:
Task 1: Completed the articletopic filter menu UI with menu and chip input component https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/1042325
Task 2: Added a database query method to ORES to display the articles that have all the selected topic filters https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/1050746
2. Goals for Next Week:
Goal 1: Work on the feedback received from the mid evaluation.
Goal 2: Discuss the implementation of the new feature T327955 : See and filter with percent similarity to top deleted revision.