Page MenuHomePhabricator

Special:NewPageFeed - add option to filter by pageviews
Open, Needs TriagePublic

Description

Add functionality to sort the NewPageFeed by pageview count, so that Reviewers can prioritise high impact articles.

Originally proposed here: https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/Suggested_improvements#42._Filters_by_a_score_of_estimated_public_interest

Event Timeline

Niharika renamed this task from Special:NewPageFeed filter by estimated public interest to Special:NewPageFeed - add option to filter by pageviews.Jan 18 2019, 5:02 AM
Niharika added a subscriber: Niharika.
JTannerWMF added a subscriber: JTannerWMF.

It appears the CommTech team is working on this.

The Community Tech team has evaluated this request, which included an investigative ticket: T225169. The work presents significant challenges, but there may be an alternative solution. We have posted the following update to Meta-Wiki and Wikipedia, but I'll add the details to this ticket as well:

• First, the challenges (according to analysis from the engineering team): In order to filter/sort by inputted numbers, the numbers must be stored in the database in a specific manner. This first step alone would take several weeks, if not months, according to the estimates provided by Wikimedia database experts. Then, we would need to populate the sortable cells with pageview data, which comes from an external service. To do this, we would need to create a process that pulls the data from the external service and stores it in MediaWiki’s PageTriage table. Then, we would do this work repeatedly, so that the numbers would remain up-to-date, over the entire PageTriage database (which consists of tens of thousands of rows, if not more). This process is both uncommon (in MediaWiki servers) and complex; we would need to define this process and identify the correct way to implement it, in collaboration with Operations and Database experts. In total, we do not find the request, in its current form, within our scope. For more details on the technical analysis and discussion with the database administrators, you can check out the associated investigation ticket.
• Second, the alternative solution (as described in the T225169 investigation): We could display the number of pageviews in the article record, without allowing for sorting or filtering. Would this be a satisfactory alternative to the community? And, if so, how would you like the number of pageviews displayed (e.g. average per day, median per day, total views in the last 30 days, etc)? Note that the results displayed will be from 24 hours earlier than the display time, and we’ll want to query from a maximum of 30 days ago (for the sake of general efficiency and manageability of this feature). We do not yet know if we can do this work — but, if we could, would it be worth our time and effort, in your opinion?

Update: We have created a separate ticket for the proposed work below (T230567)

This proposal in T230567 failed to reached consensus, so we'll leave it as an open ticket, if things change at a later date and another team would like to take it on. I'm removing the Community Tech tag from this ticket, as we've now wrapped up the Page Curation Improvements project. More details on the project and its final outcomes can be found on the Page Curation Improvements project page. Thanks!

Alternative: We don't actually need the exact number of page views, just a general sense of popularity. You can do that by storing (ceil) log-base10 of the pageviews as a page_tag. That way there are a limited number of distinct values in the tag, and the reviewer has a general sense of the popularity of the article.