Page MenuHomePhabricator

Redirects converted into articles should appear in the New Pages Feed indexed by the date of creation and creator of the article, not of the redirect
Open, NormalPublic5 Story Points

Description

Value proposition

Currently, redirects converted into proper articles are listed in the New Pages Feed with the date of creation of the redirect. It would be more useful if they were indexed by the date of creation of the article. Relatedly, it would be more helpful if the Created by.... displayed the name of the creator of the article and not the redirect.

Changes requested:

  • Articles created from redirects should show up in the feed with the date of the creation of the article (dates displayed on the right in the below screenshot):
  • The message Created by $user under the article name should show name of the creator of the article and not the redirect.

Notes:

Event Timeline

joeroe created this task.Feb 2 2017, 4:05 PM
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptFeb 2 2017, 4:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
This comment has been deleted.
Restricted Application added a project: Growth-Team. · View Herald TranscriptOct 24 2018, 5:21 PM

Not sure why this task was ever closed. It definitely isn't a duplicate of the one suggested but an entirely different issue.

JTannerWMF moved this task from Inbox to External on the Growth-Team board.Mar 19 2019, 6:09 PM
JTannerWMF added a subscriber: JTannerWMF.

It appears the CommTech team is working on this.

Niharika triaged this task as Normal priority.Jun 4 2019, 5:50 PM
Niharika added a project: Community-Tech.
Niharika updated the task description. (Show Details)
Niharika moved this task from Untriaged to To be estimated/discussed on the Community-Tech board.
Niharika set the point value for this task to 5.Jun 11 2019, 11:44 PM
MusikAnimal removed a project: Growth-Team.

I've got a rough plan for this that I think will work. However I'm going to side-step this until T225239 and related work is resolved. The implementation will depend on the outcome of that investigation.

Change 521403 had a related patch set uploaded (by MusikAnimal; owner: MusikAnimal):
[mediawiki/extensions/PageTriage@master] [WIP] Index articles-from-redirects by newest revision and not the first

https://gerrit.wikimedia.org/r/521403

I'm not sure what to do with this. https://gerrit.wikimedia.org/r/521403 makes the datestamp be the revision that changed the redirect to an article (let's call that the "article-from-redirect revision"); that part was easy. Changing the author however is a different story. ArticleCompileUserData::compile() and ArticleCompileBasicData::compile() both need to reference the article-from-redirect revision. Currently they are just running a query to find the oldest revision. I don't think it's that easy to pass around the relevant revision to these compile methods, similar to what is done with https://gerrit.wikimedia.org/r/521403. I believe these methods might get called over multiple requests, too, so we might need to persist the ID of the article-from-redirect revision for easy referencing.

I mentioned T225239: [4 hours] Can we figure out when redirects become complete articles and the previous reviewed state? was related. For that, even if we stored the revision ID (from which we can get the SHA), it would be referencing the last revision that was reviewed, which is not necessarily the article-from-redirect revision.

I just wanted to say out loud where I am with this task, in case someone else has ideas. I will continue to investigate.

And mind you, once a article-from-redirect is reverted back to a redirect, the datestamp/author for that entry in the queue also needs to revert back to what it was before.

MusikAnimal changed the task status from Open to Stalled.Jul 12 2019, 3:06 PM
MusikAnimal added a subscriber: ifried.

We discussed this in the engineering meeting, and we're not really sure what defines the "page author". Consider these scenarios:

  • Old redirect -> article -- use the article-from-redirect revision, which is what this task is about.
  • Redirect -> article -> redirect -- use the revision of the original redirect?
  • Article A -> redirect -> article B -- use the revision for article A, or article B? Note the content and author for article A and B could be different.
  • New article -> redirect -- use the initial revision to the page, or the redirect?

My interpretation is to use the following logic:

  • For articles, use the latest revision that made the redirect into an article, or the initial revision if it was never a redirect.
  • For redirects, use the latest revision that made the article into a redirect, or the initial revision if it was never an article.

I could easily be missing something. We should consult the community to find a consensus on what the logic should be.

I guess I'll changed this task to "stalled" until we know what we want. cc @ifried

@MusikAnimal I think your logic is right for both articles and redirects.

94rain added a subscriber: 94rain.Jul 12 2019, 4:00 PM
ifried added a comment.EditedJul 24 2019, 7:08 PM

Update: Discussion of this ticket and the proposal can be found on the project talk page.

MusikAnimal changed the task status from Stalled to Open.Jul 29 2019, 6:14 PM

We've come up with a plan and I'm now resuming development.

MusikAnimal added a comment.EditedAug 7 2019, 10:48 PM

This is the new query used in my patch:

SELECT rev_id
FROM `revision`
LEFT JOIN `change_tag` ON ((ct_rev_id = rev_id))
WHERE rev_page = '534366'
AND (
  ct_tag_id = (
    SELECT ctd_id
    FROM `change_tag_def`
    WHERE ctd_name = 'mw-new-redirect' -- or 'mw-removed-redirect'
  )
)
ORDER BY rev_timestamp DESC
LIMIT 1

Production EXPLAIN results:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1PRIMARYrevisionrefPRIMARY,page_timestamp,page_user_timestamp,rev_page_idpage_timestamp4const52164Using where; Using index
1PRIMARYchange_tagrefchange_tag_rev_tag_id,change_tag_tag_id_idchange_tag_rev_tag_id9enwiki.revision.rev_id,const1Using where; Using index
2SUBQUERYchange_tag_defconstctd_namectd_name257const1Using index

This is for Barack Obama, an extreme example with about 27,000 revisions. It took just over a second to run. Realistically however we'll be dealing with new pages and old redirects with no more than a few hundred revisions. "Libertarian capitalism" (page ID 1085480) is a real-world example of a page with article-from-redirect history, and the query runtime was 0.00 seconds.

The query needs to be ran on master, and under the current system it's apparently ran on every edit (post-save), though subsequent queries to the same page within a short time frame should be very fast thanks to caching/buffering/whatever. Overall I think this is acceptable...?

In addition to possible concerns with the query, there are existing issues with the extension that might be exasperated. Mainly, that the query will unnecessarily be ran on every edit (which is already the case, but with a much simpler query). I think this needs to be addressed first, and whether or not that's worth our time, I'm unsure. In addition, the logic introduced with my patch feels fragile to me. I've tried to make the tests comprehensive (relative to the new code), but I can't even get them to pass, and I am not super confident the patch won't cause other unintended side effects. Finally, I suspect the community may not be pleased with some aspects of the new system. We cannot cleanly do any sort of revert detection, so if editor A changes the redirect to an article, and editor B changes back to the redirect, editor B incorrectly gets credit for the redirect itself. Flip that around and you have a worse scenario -- editor A writes an article, editor B changes it into a redirect, then editor C turns it back to the preexisting article. Now any messages you send to the creator using Page Curation will go to the wrong person :(

Overall, after two weeks of development, I'm going to suggest we forgo this. The real issue exists with MediaWiki itself, whereby there's no notion of page authorship after a change in redirect state. For instance, the action=info page will always show the author of the initial revision, regardless of the redirect history. We may have to accept that this is just a caveat of how revision histories works in MediaWiki.

@MusicAnimal What does this status mean to a layman? Is it done or isn't it?

It means it's not done @Kudpung and Musik is saying he doesn't practically think it can be achieved.

@Barkeep49, in which case it's probably not a priority anyway. A 'nice-to-have', but not essential.

I think it's a little more useful than that - any article created from a redirect are automatically indexed for instance and if a user is creating troublesome articles (or really good ones) from redirect we can't filter by their creations. But overall I agree this is not a must have.