Page MenuHomePhabricator

[Proposal] Improve searchability and filtering of PageTriage feed
Closed, DeclinedPublic

Description

Profile Information

Name: Nimish Medatwal
IRC nickname on libera.chat IRC: nimishmedatwal
Web Profile: Phabricator, Github
Resume
Location (country or state): India
Typical working hours (include your timezone) : weekdays: 10:00 AM IST to 7:30 PM IST; weekends: 11:00 AM IST to 7:30 PM IST Saturday

Synopsis

  • PageTriage is a tool on Wikipedia to manage new pages.
  • It helps patrollers track, categorize, and handle problematic new pages.
  • One of its features is the New pages feed, built with VueJS, allowing patrollers to filter interesting pages based on criteria.
  • Community members have shown interest in improving the filters and search capabilities of the New pages feed.
  • Proposed enhancements include:
  • Introducing AI-based topic prediction using the ORES API.
  • Adding keyword search functionality within articles.
  • Implementing filters based on the number of pageviews an article receives.
  • Enabling search for similar deleted pages.

Microtask

Create a small independent tool/web app that interacts with any Wikimedia API and displays some information about a article. The tool must have a frontend built using VueJS and the Wikimedia Codex UI library. Include a link to the source code in your proposal

Github Link : Source Code
Demo Link : Demo Video

PageTriage setup has successfully completed

image.png (846×1 px, 196 KB)

Proposed Solution

Here's how I would attempt to solve each of the task:

T218132 Add ORES topic prediction to the NewPagesFeed
Update ORES API Integration:
  1. Modify Existing ORES API Integration:
    • Review the current implementation of ORES API integration within PageTriage to understand its structure and endpoints.
    • Integrate the articletopic endpoint into the existing codebase to enable fetching predictions for article topics from ORES.
  1. Send Requests for articletopic Predictions:
    • Adjust the request parameters to include the necessary data for articletopic predictions.
    • Ensure that PageTriage can effectively communicate with the ORES API to retrieve articletopic predictions for new articles.
Modify Filters Menu:
  1. Update Special:NewPagesFeed Filters:
    • Identify the components responsible for rendering the filters menu on the Special:NewPagesFeed page.
    • Integrate a new option for filtering by article topics into the existing filter menu structure.
  1. Integrate articletopic Predictions:
    • Modify the backend logic to fetch articletopic predictions for articles based on the selected filter options.
    • Display the articletopic predictions alongside other filter criteria, allowing users to refine their feed based on specific topics of interest.
Update Page Curation Toolbar:
  1. Modify "Page Info" Flyout:
    • Locate the relevant components that control the display of information within the "Page Info" flyout in the Page Curation toolbar.
    • Update the flyout to include the predicted article topic as part of the information displayed for each article.
  1. Include Option for Viewing Predicted Topic:
    • Implement an interactive feature within the Page Curation interface that allows users to view the predicted topic directly.
    • Ensure that users can easily access the predicted topic information without disrupting their workflow.
T207238 Special:NewPageFeed - add option to filter by pageviews
  1. Update Data Retrieval:
    • Modify the backend code responsible for fetching data for the NewPageFeed to include information about pageviews.
    • Retrieve pageview data either from an external service or from internal metrics tracking within the Wikimedia ecosystem.
  1. Calculate Public Interest Score:
    • Develop a formula or algorithm to calculate a public interest score based on pageviews and possibly other factors such as the number of editors.
    • The formula could be as simple as multiplying pageviews by the number of editors or more complex based on community-defined criteria.
  1. Integrate Filter Option:
    • Update the frontend interface of Special:NewPageFeed to include an option to filter by the newly calculated public interest score.
    • Add a dropdown menu or input field where users can specify a range or threshold for the public interest score they want to filter by.
  1. Backend Filtering:
    • Implement backend logic to filter the content displayed in Special:NewPageFeed based on the selected public interest score range.
    • Ensure that the filtering process efficiently handles large datasets and accurately identifies articles that meet the specified criteria.
  1. Testing and Validation
T207761 Keyword Search for New Pages Feed
  1. Update User Interface:
    • Add a new field in the "That" section of the search filters in the New Pages Feed interface.
    • Position this field as the last option in the list.
    • Label the field as "Has the following keyword(s)" to indicate its purpose to users.
  1. Backend Integration:
    • Modify the backend code to include functionality for keyword search.
    • Implement a search mechanism that checks for the presence of the entered keyword(s) in the article text.
  1. Search Query Handling:
    • Extract the keyword(s) entered by the user from the search field.
    • Construct a search query that includes conditions to match articles containing the specified keyword(s) in their text.
  1. Filtering Search Results:
    • Retrieve the list of new pages from the database based on the existing search filters.
    • Apply additional filtering to the retrieved list to include only those articles that match the keyword(s) entered by the user.
  1. Display Filtered Results:
    • Update the frontend interface to display the filtered list of articles in the New Pages Feed.
    • Ensure that only articles containing the specified keyword(s) are shown to the user.
T327955 See and filter with percent similarity to top deleted revision
  1. API for Retrieving Top Deleted Revision:
    • Develop an API endpoint that allows PageTriage to pull the top deleted revision of an article.
    • Ensure that this API endpoint is efficient and only calculates the similarity when needed to avoid unnecessary resource consumption.
  1. Comparison Algorithm:
    • Implement an algorithm to compare the top deleted revision with the current top revision of an article.
    • Calculate the percent similarity between the two revisions based on their Wikicode content.
    • Consider using text similarity algorithms such as Levenshtein distance or cosine similarity.
  1. Detection of Previous AFD:
    • Extend the existing detection mechanism in PageTriage to identify if an article has undergone a previous Articles for Deletion (AFD) process.
    • Check for the existence of an AFD page associated with the article to determine its deletion history.
  1. Frontend Integration:
    • Update the frontend interface of PageTriage to display the percent similarity to the top deleted revision for each article.
    • Decide whether to include a button for manually triggering the calculation or to automatically calculate it when visiting the article.
  1. Filtering Option:
    • Add a filtering option in the PageTriage interface to allow users to filter articles based on their percent similarity to the top deleted revision.
    • Ensure that the filtering mechanism accurately identifies articles that meet the specified similarity threshold.
  1. Integration with Metadata:
    • Consider adding the percent similarity information as a metadata tag (e.g., pagetriage_page_tag) for each article.
    • This metadata can be used for further analysis and reporting purposes.

Deliverables

Describe the timeline of your work with deadlines and milestones, broken down week by week. Make sure to include time you are planning to allocate for investigation, coding, deploying, testing and documentation

DateTasks
Community Bonding Period- Review project requirements and goals with mentors. - Discuss implementation details and clarify uncertainties with mentors.
May 27 - June 10- [T218132] Add ORES topic prediction to the NewPagesFeed: - Research and analyze the integration process of ORES topic prediction. - Begin development work to integrate articletopic into PageTriage. - [T207238] Special:NewPageFeed - add option to filter by pageviews: - Start implementing the functionality to sort the NewPageFeed by pageviews.
June 11 - June 24- [T218132] Add ORES topic prediction to the NewPagesFeed: - Continue development work on integrating article topic into PageTriage. - Conduct initial testing to ensure functionality and identify any issues. - [T207238] Special:NewPageFeed - add option to filter by pageviews: - Complete implementation of sorting by pageview count. - Test the functionality and ensure it aligns with requirements.
June 25 - July 8- [T218132] Add ORES topic prediction to the NewPagesFeed: - Refine integration of articletopic into PageTriage based on feedback. - Perform additional testing and bug fixes.
July 8 - July 12- Mid-Term Evaluation: - Review progress on all tasks and assess any necessary adjustments. - Gather feedback from mentors and make any required refinements.
July 13 - July 27- [T207238] Keyword Search for New Pages Feed: - Begin development work on implementing keyword search functionality. - Design the new field for keyword input in the search filters. - [T327955] See percent similarity to top deleted revision: - Research and plan the development of the API for comparing revisions. - Determine the best approach for detecting previous AFDs and integrating this detection into PageTriage.
July 28 - August 11- [T207238] Keyword Search for New Pages Feed: - Continue development work on keyword search implementation. - Integrate the new field into the search filters and test functionality. - [T327955] See percent similarity to top deleted revision: - Begin developing the API for comparing revisions. - Implement the necessary frontend components for users to interact with the comparison feature.
August 12 - August 26- [T207238] Keyword Search for New Pages Feed: - Conduct thorough testing of keyword search functionality. - Address any issues or bugs identified during testing. - [T327955] See percent similarity to top deleted revision: - Finalize development of the revision comparison feature.
August 26- Complete final testing and bug fixes for all implemented features. - Prepare documentation and release notes for stakeholders. - Officially deploy the updated PageTriage tool with enhanced filtering and search capabilities.

Participation

  • I will be online on IRC in my working hours ( 1:00 pm to 9:00 pm UTC +5:30) to collaborate with the mentors.
  • I will use Phabricator for managing bugs and subtasks.
  • I will be available in gmail to be contacted when needed in the non-working hours.

About Me

Tell us about a few:

  • Your education (completed or in progress) :

I am currently in 6th Semester pursuing B.E in Computer Sciences from Thapar University, Patiala.

  • How did you hear about this program?

I'm always interested in expanding my coding skills and contributing to open source projects. GSoC seemed like a perfect fit, offering a chance to work on a challenging project with a supportive community. It's a great way to learn new technologies, collaborate with developers around the world, and make a real impact on an open source project that I'm passionate about.

  • Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?

I am committed to dedicating 30 to 40 hours per week to the project, spanning from May 28th to August 19th, totaling approximately 350 hours. After my end-semester exams ending on May 27th, 2024, I will be fully available to focus solely on my work with WikiMedia, ensuring uninterrupted dedication to the project.

  • We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

Yes, I am applying to both outreachy T357409 and GSOC T357337

Past Experience

Event Timeline

@Soda, @TheresNoTime due date for the final application is approaching would really appreciate your feedback on my proposal.

Maryann-Onyinye renamed this task from Improve searchability and filtering of PageTriage feed to [Proposal] Improve searchability and filtering of PageTriage feed.Apr 2 2024, 11:11 AM

Hi, this proposal unfortunately was not selected for this year's GSoC program. We had multiple amazing applications this year and we had to make the tough decision of choosing a single candidate. Not being selected for this particular year does not reflect on your abilities or your qualifications. Hope you don't feel discouraged, and we hope you will stay on and continue contributing to the Wikimedia movement in some capacity :)