Page MenuHomePhabricator

phantom redirects lingering in incategory searches after page moves
Closed, ResolvedPublicBUG REPORT

Description

What happens?: Recently, when I perform the regular and essential task of doing an incategory search of https://en.wikipedia.org/wiki/Category:Living_people to search for draftspace or userspace pages that should not be filed in mainspace categories, the search finds a number of phantom redirects where the page in fact isn't in draftspace at all, but rather has already been moved into mainspace -- but the resulting draftspace redirect doesn't have any categories on it, and if you manually eyeball the individual categories that it's in they don't actually show the draftspace titles as being filed in them

For example, today's run of the search at https://en.wikipedia.org/w/index.php?search=incategory%3A%22Living_people%22&title=Special:Search&profile=advanced&fulltext=1&ns2=1&ns3=1&ns118=1&ns119=1 displays the title https://en.wikipedia.org/w/index.php?title=Draft:Kadek_Dimas_Satria&redirect=no. Note that the draft title doesn't have categories in it, and if you look at any of the categories that are on the target page https://en.wikipedia.org/wiki/Kadek_Dimas_Satria, they do not list the draftspace redirect as being in them -- but if you go back and perform an incategory search on each of those categories to look for draftspace pages, the incategory search does still say that Draft:Kadek Dimas Satria is in each and every one of them.

The issue invariably results from cases where an editor applied categories to the page while it was still in draft, and then moved the page into mainspace after adding the categories. So far, the only solution I have found that works to clear the phantom redirects out of the incategory search is to actually move the page back into draftspace, and wrap the categories in the "draft categories" wrapper; this would finally cause the page to drop from the incategory search, following which I could then move the page back into mainspace again and unwrap the categories in mainspace, and the redirect would not then return to the incategory search again. Nothing else has successfully cleared the redirects from the search: null-editing the draftspace redirect didn't work, deleting and then restoring the draftspace redirect didn't work, adding the redirects to a maintenance holding category didn't work.

I never, ever saw even one case of this ever happening at all before February 2023. A couple of weeks ago, for the first time ever, there was one standalone case of it which looked like an isolated problem at the time, but then after I resolved the problem by redoing the page move it did not recur again until March 1 -- at which point it suddenly became an epidemic, with eighteen phantom redirects turning up so far just in the past three days alone. I had already corrected the seven instances I found on Wednesday and Thursday, but with eleven more of them today, I'm at the end of my patience with it.

This also is not just normal lag in the job queue, as real drafts which are in the category inappropriately, and have the category removed or disabled accordingly, still successfully drop from the search results within less than one minute.

What should have happened instead?: Obviously, draftspace redirects that don't have categories on them should not be showing up in incategory searches of those categories if they aren't actually in the categories. It's absolutely essential that I be able to do a clean incategory search on Living people -- with over one million articles in that category, searching for draft and userpages manually isn't a feasible alternative at all, so I need to be able to do an incategory search on that category without having it polluted by pages that aren't actually in the category. Removing draft and userpages from articlespace categories is an essential maintenance task that cannot be ignored, so I can't just stop scanning Living people for such pages entirely -- and while redoing the page move myself was a viable workaround when there were just one or two isolated instances, it isn't so feasible anymore when there are 10, 20 or 30 phantom redirects to deal with at the same time.,

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

This is continuing to happen, with 20 phantom redirects (or, rather, it ''reports'' 20 as the number of pages, but only ''displays'' 17 pages) now in the category and nothing dropping unless I redo the page moves from scratch. This is not a "put up with it" situation; it needs to be resolved.

This is something that might be addressed as part of T317045.

MPhamWMF triaged this task as Medium priority.Mar 6 2023, 4:46 PM
MPhamWMF moved this task from needs triage to Bugs on the Discovery-Search board.

Change 894709 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894709

dcausse raised the priority of this task from Medium to High.Mar 6 2023, 6:12 PM

Change 894709 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894709

Change 894677 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.40.0-wmf.25] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894677

Change 894677 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.40.0-wmf.25] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894677

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:24:25Z] <dcausse@deploy2002> Started scap: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:28:36Z] <dcausse@deploy2002> dcausse: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:41:00Z] <dcausse@deploy2002> Finished scap: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]] (duration: 16m 34s)

The problem should be resolved, new page moves across namespaces should properly delete the page in the old namespace. Some phantom redirects created because of this bug will stay until the process that cleanups the index fixes these pages (it can take up to 8weeks), if you identify annoying ones please do let us know in this ticket so that we can clean them up manually quicker. Sorry for the inconvenience this has caused.

I'll note that there is one new page so far that ended up in the incategory search today for the same reasons, but I'm not immediately doing anything about it because of what you said about how the process fix may take time to work through the database -- and also because I've nominated the mainspace move target for AFD as improperly sourced possible self-promotion anyway -- but I also wanted to ask: despite there only being one page currently in the search, the number of pages is being reported as three by the "results" counter in the top right corner. Would this simply be an artifact of the same problem, which will clean itself up as the fix that was already applied here propagates, or would this be a different problem that has to be looked at separately?

I'll note that there is one new page so far that ended up in the incategory search today for the same reasons, but I'm not immediately doing anything about it because of what you said about how the process fix may take time to work through the database -- and also because I've nominated the mainspace move target for AFD as improperly sourced possible self-promotion anyway -- but I also wanted to ask: despite there only being one page currently in the search, the number of pages is being reported as three by the "results" counter in the top right corner. Would this simply be an artifact of the same problem, which will clean itself up as the fix that was already applied here propagates, or would this be a different problem that has to be looked at separately?

The fix for this ticket was applied on all WMF servers today at 2023-03-07T08:41:00‎

  • User:Tuokkarr/sandbox (Daniele_Servadei) was moved today at 2023-03-07T07:04:28‎ (moved before the fix)

The two invisible results are due to the same problem I believe but are removed as part of existence check done when displaying results

  • Draft:Move/Catherine E. Delahodde moved at 2023-03-07T06:14:26‎ (moved before the fix)
  • Draft:Move/Jim Connors moved at 2023-03-03T23:09:10Z (moved before the fix as well)

I manually ran the clean up script on these 3 pages to avoid future confusions, please do let me know if you still encounter this issue in the future.