Page MenuHomePhabricator

Magic word to remove page from internal MediaWiki search results
Open, LowPublic

Description

One of the perennial debates (on the Hungarian Wikipedia, at least) is whether to delete redirects from old names in the project namespace. On one hand, even when links are updated, it destroys some navigation pathways (bookmarks, external links etc.), makes old revisions of pages (where the links are not updated) less readable and annoys some people; on the other, the old (and often erroneous or misleading) names clutter up the search suggestions dropdown, which is by far the most user-friendly navigation method currently). There is a similar problem with redirects from misspelled names of articles.

It would be very useful to have a __NOSEARCH__ magic word which would suppress indexing of such pages by the internal search engine (or maybe just adding a flag so that the page is only returned when such pages are explicitly requested).

Details

Reference
bz22251

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:54 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz22251.
bzimport added a subscriber: Unknown Object (MLST).
Tgr created this task.Jan 24 2010, 11:48 AM

rd232 wrote:

*** Bug 24169 has been marked as a duplicate of this bug. ***

I wouldn't want a new magic word. May a config option, if NOINDEX should also apply to internal search would do.

rd232 wrote:

(In reply to comment #2)

I wouldn't want a new magic word. May a config option, if NOINDEX should
also apply to internal search would do.

I doubt that if MediaWiki had that option, Wikimedia projects would use it: NOINDEX hiding content from internal search is usually not the desired behaviour for existing uses of NOINDEX. Equally, it's not certain that NOSEARCH uses should be hidden from search engines. Just provide both, for flexibility.

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptNov 16 2017, 5:09 PM

One concern here is the potential for abuse if people can mask content from internal search.

Another concern here is that some people use __NOINDEX__ with the specific intention of excluding content from external search engines, but continuing to include that content in internal search engines. A separate magic word such as __NOSEARCH__ might address this.

debt added a subscriber: debt.

We'll need more information on the use cases for doing this and there might be (already) a different way to go about this.

Why not just exclude every redirect page from search and dropdowns? I don't see a reason, why any redirects should show up in a search box.

@Bachsau, there's quite a few reasons why we'd want to have redirects appear in the search box. Two examples off the top of my head. Common misspellings of article titles are often created as redirects. The article for "Mississippi" has 9 redirects for misspellings alone. :) Colloquial names for things as well. "Show Me State" redirects to the article on the state of Missouri, USA as it's the state's motto. I hope that helps explain a few reasons why we just can't exclude them all.

The article for "Mississippi" has 9 redirects for misspellings alone. :)

In that case, it should be enough to have "Mississippi" appear once in the dropdown in correct writing, if the search term matches a page redirecting to it. Redirects would still work, but we don't need them appear in a dropdown box. Instead, it would just look like an autocorrection.

Tgr added a comment.Mar 4 2018, 7:19 AM

Another use case is abusive user names. The user pages of these are usually templated for transparency, but that means writing the abuse target's name in the search box gives abusive results in the dropdown.

One concern here is the potential for abuse if people can mask content from internal search.

I think abuse is not typically found by searching for it, but marked pages could be added to a service category if reviewing them is a concern.

Legoktm added a subscriber: Legoktm.Mar 4 2018, 7:44 AM
In T24251#4021783, @Tgr wrote:

Another use case is abusive user names. The user pages of these are usually templated for transparency, but that means writing the abuse target's name in the search box gives abusive results in the dropdown.

Abusive user names should be hideuser-blocked so they don't appear in search results / for normal users.

Tgr added a comment.Mar 4 2018, 8:01 AM

Abusive user names should be hideuser-blocked so they don't appear in search results / for normal users.

AFAIK that doesn't prevent the user page from appearing in search results, since it is a page that exists (it has a template saying why the user got blocked). Also hideuser is an oversight right so not available on most wikis. (Maybe that should be fixed; IMO it would make sense to allow admins to hide users as they can mostly do it already, but in a very cumbersome way.)

Krinkle renamed this task from Magic word to remove page from (quick)search results to Magic word to remove page from internal MediaWiki search results.Mar 22 2018, 12:06 AM
Krinkle updated the task description. (Show Details)
Krinkle removed a subscriber: wikibugs-l-list.