Page MenuHomePhabricator

Spike: Automatically review pages that were reverted to a previously reviewed state [Timebox: 8 hours]
Open, NormalPublic

Description

As a developer, I want to investigate the work required to fulfill the goals of this ticket, so that we can determine the technical prep required for this work.

Note: This work has parallels to the work in T157048. Review the notes in that ticket to help determine the feasibility and challenges of this work.

Background:

When a redirect is converted to an article, it gets sent to the new pages feed (was recently broken and then fixed in T223828). We would like to expand on this logic:

  • If an article-from-redirect is reverted back to a redirect, it currently needs to be re-reviewed. It should automatically be marked as reviewed, since that version was already patrolled.
  • The opposite, where an article is converted to a redirect and back. Again it should be automatically marked as reviewed.

Essentially, the system should automatically detect that the page was reverted to a previously patrolled state. This is only applicable when a redirect becomes an article and vice versa.

Relevant investigation at T225239

Details

Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Note that I've provided a way to patrol these from recent changes in core.
Doing this from Special:NewPages would require modifying the query, which might not be possible to do in a performance-friendly way, and show the patrol footer when relevant, which would also be problematic.

Swpb added a comment.May 27 2015, 1:47 PM
This comment was removed by Swpb.
-jem- added a subscriber: -jem-.Aug 6 2015, 1:22 PM
Swpb added a comment.Nov 10 2015, 3:33 PM

I noticed that pages like this one are appearing on NewPagesFeed (as redirects) after being un-redirected, but with the date the page was originally created. Could we simply list these pages by the date they are un-redirected?

Swpb added a comment.Jul 8 2016, 7:22 PM

It's been 15 months since this request started, with no apparent action in over a year. Why is no priority given to community-requested changes?

Swpb raised the priority of this task from Lowest to Normal.Jul 8 2016, 7:22 PM
joeroe added a subscriber: joeroe.Feb 2 2017, 4:08 PM

This proposal is selected for the Developer-Wishlist voting round and will be added to a MediaWiki page very soon. To the subscribers, or proposer of this task: please help modify the task description: add a brief summary (10-12 lines) of the problem that this proposal raises, topics discussed in the comments, and a proposed solution (if there is any yet). Remember to add a header with a title "Description," to your content. Please do so before February 5th, 12:00 pm UTC.

And the latest status of this is.......... ?

@Kudpung: There are no news, otherwise they could be found here.
The patch linked in https://phabricator.wikimedia.org/T92621#1308458 needs a manual rebase; that seems to be the next step anyone could perform.

https://en.wikipedia.org/wiki/Wikipedia_talk:New_pages_patrol/Reviewers#Old_redirect_in_feed

This will continue to be an issue until it gets fixed. Essentially the issue is that when a redirect is converted to an article, it gets sent to the new page feed. But if this action is reverted back to a redirect, it stays in the new page feed (but shouldn't).

The opposite, where an article is converted to a redirect and back, is also an issue. Essentially the system should automatically detect when it has been reverted back to a previous revision and correctly assume that no further action is needed.

A related bug is T157048, where Redirects converted into articles appear in the New Pages Feed indexed by the date that the redirect was created, but should instead be indexed by the date that they were converted from a redirect into an article (this would stop them piling up at the back of the feed, or worse dropped in the middle of the feed where they won't be noticed for some time).

Restricted Application added a project: Growth-Team. · View Herald TranscriptOct 23 2018, 6:22 AM
JTannerWMF moved this task from Inbox to External on the Growth-Team board.Mar 19 2019, 6:09 PM
JTannerWMF added a subscriber: JTannerWMF.

It appears the CommTech team is working on this.

Until this is implemented (if it is), a work around is using Special:NewPages (not Special:NewPagesFeed) filtered with the tag mw-new-redirect and hiding redirects. This will only show pages that were initially created as redirects, but no longer are.

srishakatux added a subscriber: srishakatux.
aezell added a subscriber: aezell.May 23 2019, 5:17 PM

I'm summarizing what I understand the request here. Please correct if I've missed something.

If a redirect is converted into an article and then reverted to the original redirect, the redirect should not appear in the feed for patrol. The redirect itself should have been reviewed/patrolled before. If not, it should appear in the feed. So, there are two checks here: 1) Was this article a redirect previously and is now being reverted to that redirect and 2) has that redirect been reviewed before.

@aezell I think that is the uncontroversial part of this request. Articles being turned into redirects and then back into articles appears to be part of the original request from 2015 but I think consensus on this has changed and should not be implemented at this point. Handling clear vandalism is easy enough these days and some number of articles are redirected in AfD discussions these days and thus need patrolling if restored to articles.

@Barkeep49 Thanks. My comment was a summation of a discussion we had within the team earlier today. I'm glad to read that we likely understood the request well enough.

MusikAnimal renamed this task from Implement addition of un-redirected pages to Special:NewPages and Special:NewPagesFeed to Implement addition of un-redirected pages to Special:NewPagesFeed.Jun 17 2019, 9:38 PM
MusikAnimal updated the task description. (Show Details)
MusikAnimal renamed this task from Implement addition of un-redirected pages to Special:NewPagesFeed to Automatically review pages that were reverted to a previously reviewed state.Jun 17 2019, 9:49 PM
MusikAnimal added a subscriber: MusikAnimal.

Hopefully the revised title and description make it more clear what is being requested. If I've got it wrong, please feel free to correct it.

Per our internal discussion, getting this to work for Special:NewPages is a rather radical change to Core that is unlikely to happen from our team. Tracking of redirects-to-articles is already built into PageCuration, so we will build on that.

I removed Developer-Wishlist (2017) as this isn't really a developer feature.

MBinder_WMF set the point value for this task to 5.Jul 2 2019, 11:35 PM

See T225239#5251954 for results of the investigation. In short, we'll use a reviewed_sha tag in order to determine whether a page was reverted to a previously reviewed state.

ifried added a subscriber: ifried.Jul 17 2019, 6:12 PM
ifried removed the point value for this task.Jul 31 2019, 11:25 PM

We should re-estimate this ticket (based on the new information that we have), so I removed the previous estimation.

ifried updated the task description. (Show Details)Aug 1 2019, 5:59 PM
DannyS712 updated the task description. (Show Details)Aug 1 2019, 6:09 PM
ifried renamed this task from Automatically review pages that were reverted to a previously reviewed state to Spike: Automatically review pages that were reverted to a previously reviewed state [8 hours].Aug 13 2019, 11:58 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)Aug 14 2019, 12:04 AM
ifried renamed this task from Spike: Automatically review pages that were reverted to a previously reviewed state [8 hours] to Spike: Automatically review pages that were reverted to a previously reviewed state [Timebox: 8 hours].Aug 15 2019, 7:47 PM

I've been trying to get some tests written and to understand the various places that need to be changed in order to implement this. But it's taking time, mainly due to the test system not being very easy to use (partly because we have application logic in the API classes, so we have to do internal API requests in the tests for those, which makes things more opaque). I'm happy to carry on, and of course will get there in the end, but is it worth it? How much time should be spent on this?

I think the work could at least be split up: first, record the sha1 of (all?) reviewed revisions; and then, in separate work, prevent the addition of pages to the queue where their new sha1 matches what's already recorded.

@ifried I think this is going to be a bit of a long-winded job; can you advise how much should be done?

@Samwilson The most important outcome of the spike is to determine the general level of difficulty and risks involved (and, specifically, if we would encounter the same challenges found in T157048). In simplest terms, I want to know how much time/effort, roughly speaking, it would take us to do this work (so we can know if we should take it on). In order to make this approximation, I understand why we would want to look into the various places in which changes would be made (which, from my understanding, is the time-consuming bit).

So, here's my thought: I think that whatever is critical to making that general determination should be explored. However, if certain elements of the spike may not be critical for this determination (but they may veer into development-like work that would ideally be taken on by someone making the changes), then perhaps that work is less essential. Do you have a rough sense of the time that may be left for this spike? That may also help me have a better idea of where we're at.

And, @Mooeypoo, any thoughts?

aezell added a comment.EditedSep 6 2019, 10:30 AM

An appropriate result out of any 8 hour spike could be, "There's enough work just to figure out the problem that I couldn't finish it in 8 hours." That's some data that's interesting also.

@Samwilson What's the status of this task? It's in Needs Review but it's not clear to me what the outcome/recommendation is.

Sorry, good point.

In summary, this work is doable, and can be broken into two parts: 1) storing the hash of a reviewed revision at the time at which it's marked reviewed; and 2) when a page in the queue is edited, check to see if the new revision's hash matches the already-reviewed one and if it does then mark it reviewed. The first part seems reasonably solid and would only add complexity to the review process, although there might be an issue around wanting to have the same review details (reviewer, date, etc.) as the earlier review. It's the second part that's more complicated (I think) and may have performance implications. Because of the way things are processed and the structure of PageTriage, we don't have access to some of the required data in the required places, and so would have to query it even when it's not strictly required. I've dug into this a but, but don't have a complete picture of what it'd look like.

I recommend that we proceed if this feature is very valuable, but not if it's only a nice-to-have, because of the time it'll take and the complexity it'll add to an extension that's already hard to maintain.

Thanks @Samwilson! That's very helpful.

@ifried I think this is up for your decision based on Sam's explanation above.

I won't pretend to fully understand the performance implications of @Samwilson analysis but let me ask a question. Could a similarity score make this easier for recreated pages?

I won't pretend to fully understand the performance implications of @Samwilson analysis but let me ask a question. Could a similarity score make this easier for recreated pages?

No, it'd be simpler with the sha1 hash of revision content. The comparing of the two things is easy; it's fetching them at the right time and place that's trickier.