Page MenuHomePhabricator

Use OpenSearch for Special:LinkSearch
Open, Needs TriagePublicFeature

Description

The reasons for not implementing a namespace filter in LinkSearch are no longer valid: moved to T12593

Without a namespace filter, the web link search is simply unusable for many large domains, as you often have to skip thousands of hits on discussion pages.

Many users circumvent the restriction by using insource search. But that's not a good solution either

A solution with good filtering options, e.g. glob or regex, would be desirable.

UPDATE 15/12/2025 (Title changed)
Add Elasticsearch ot Special:LinkSearch

  1. Split URL into domain, path, query and fragment
  2. split domain into TLD, domain, subdomains
  3. split path into folders
  4. split query into keys and values
  1. build function like indomain, infolder, inkey, invalue ...
  1. Investigate whether URL-specific functions can be used as useful filter options for general insource searches.

Event Timeline

It looks like (technically speaking) MediaWiki might already have a namespace filter for Special:LinkSearch; but that it might not currently be considered efficient enough to enable on Wikimedia wikis. The task I found for that is T12593: Make namespace filter in Special:Linksearch efficient enough for wgMiserMode.

I would merge this task into that one, but this task also contains a suggestion for Special:LinkSearch to use ElasticSearch, which seems like a separate request. If you're interested in that as a feature request specifically, I'd personally suggest either filing a new task for that request in particular (and merging this into T12593), or re-writing this task to just be about that request :)

T12593 dates back to 2007 and was imported from the good old Bugzilla.

Looks like, API (exturlusage) with namespace-filter is by far faster than list all results. API response of unfilterd result time is ruthly the same asl

https://de.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=archive.ph&eunamespace=0

The API response time for unfiltered results is almost identical to that of LinkSearch.

It seem that the API (exturlusage) with namespace filter is significantly faster than listing all results. So I think that the reasons for not implementing a namespace filter are no longer valid.

A namespace filter with default ns=0 will most likely improve performance and at the same time save a lot of time for users who are maintaining external links.

In T12593#153594, ayg wrote:

[...] the thing people really want this for is to limit to the main namespace, or maybe to content namespaces. This is much less of a problem in practice than arbitrary namespace limitations. The real issue would be someone searching for all *.wikipedia.org links in Portal_talk or something. For real-world queries, limiting to content namespaces should only increase rows scanned by a fairly small fixed factor, probably less than ten. So that can be considered, IMO.

Is there some reason I shouldn't close this as a dupe of T12593?

@Pppery : I think this can be divided into two parts. The first is the addition of a namespace filter to LinkSearch. This is low-hanging fruit and helps users significantly.

The second part is more complex, but could be effectively combined with other searches:

Addition of ElasticSearch to LinkSearch. To do this, a URL-specific index would have to be set up within ElasticSearch. The URLs should first be divided into protocol, domain, path, query, and fragment.
Paths would then be further split into directories, and queries into keys and values. Based on this, new search functions analogous to hastemplate or incategory could be created.

This would allow users to search specifically for URLs that are affected by a partial change within a domain.

Such filters could then also be offered as additional filters in the general search/insource function.

To my understanding -- in general, feature requests should normally be about a specific thing, with different feature requests split into different tasks (and, in this case, one of the feature requests here already has its own task (T12593) open to track it). @Boshomi_Phabricator, if you'd like a task to be open for the ElasticSearch request, please could you rewrite this task to be specifically about that (or file a new task specifically about it, if you'd prefer)?

Boshomi_Phabricator renamed this task from Add namespace-filter to Special:LinkSearch / Use Elastic Serarch for Special:LinkSearch to Use Elastic Serarch for Special:LinkSearch.Dec 16 2025, 8:22 PM
Boshomi_Phabricator updated the task description. (Show Details)

@A_smart_kitten @Pppery: I splited this task, changed title and Description

Tagging for awareness, not because I expect you to take any specific action.

(also tagging CirrusSearch as the extension that provides Elasticsearch searching)

EBernhardson renamed this task from Use Elastic Serarch for Special:LinkSearch to Use OpenSearch for Special:LinkSearch.Thu, Jan 8, 3:33 PM