A combination of things ruins the search forThe ability to locate all URL-style internal links that linksto does not rescue.s is important but impossible. Right?
**Background**.It's important for answering what-all links //to// some given content, for where bare URL are used in ref tags, for analytics, for operations, and its important to the [[//meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Search#Improve_Special:LinkSearch|recent Wikimedia survey]], where the improvement of LinkSearch was wanted. [[//mediawiki.org/wiki/extension:LinkSearch|Extension LinkSearch]] says "core functionality" has replaced it. That must be CirrusSearch insource. It cannot be WhatLinksHere or CirrusSearch linksto. These three are it. A [[//en.wikipedia.org/wiki/Special:LinkSearch|stated function of LinkSearch]] on its Wikipedia page is "to search for external links to pages on this site...".
It's impossible because the only approach is running a large set of queries to [[//mediaWiki.org/wiki/help:CirrusSearch|CirrusSearch]], but the type and number of queries is prohibitively ungenerous, as I will describe, so a probability is the best we can do when reporting linkage.
**Background**
To link any two particular points content, we can create URL-style internal links in such a //generous// number ways, that there can be literally hundreds of ways, each of which are significantly different enough that search can only find a few at a time.
To start the picture, here are the five most significant //from// text patterns
We can create URL-style internal links for a given pagename in a //generous//way, that is space insensitive for the parser-function and the namespace. Given a single fullpagename, there are fifteen ways that are significant to search patterns:# `[//wikipedia.org/wiki/namespace:pagename]`
# `[{{canonicalurl:namespace:pagename}}]`
# `{{canonicalurl:namespace:pagename}}`
# `{{canonicalurl: namespace:pagename}}`
# `{{canonicalurl:namespace: pagename}}`
# `{{canonicalurl: namespace: pagename}}`
# `[{{fullurl:namespace:pagename}}]`
# `[{{fullurl: namespace:pagename}}]`
# `[{{fullurl:namespace: pagename}}]`
# `[{{fullurl: namespace: pagename}}]`
# `[{{SERVER}}/{{localurl:namespace:pagename}}]`
# `[{{SERVER}}/{{localurl: namespace:pagename}}]`
# `[{{SERVER}}/{{localurl:namespace: pagename}}]`
# `[{{SERVER}}/{{localurl: namespace: pagenameSERVER}}{{localurl:namespace:pagename/}}]`
# `[{{SERVER}}/wiki/namespace:pagename/]`
# (Fullurl and canonicalurl also accept a parameterized-call form.)
It is not only the generous number of [[mediawiki.org/wiki/help:magic_words|magic words]] that confounds Search but their interplay, whitespace, and letter case.
But we can search for them in only an //ungenerous// way. For these link constructsLinkSearch results prove it tracks from text pattern #1. Nothing tracks HTML links, Search is narrow and specificactual blue or red. Insource is the only option (without a genuine linksto parameter), because page visibility of the sought construct is, although possible, not likelyLinksto tracks square brackets. WhatLinksHere skips URLs. Insource is the way to track URLs. Only WLH tracks with aliases so that it can report links to content. Even if they were in a page-visible in form,Links have no namespace and no search index. Search is camelCase sensitive, but namespace names and parser functions are fully accepting of any camelCase.(Table type?)
**The nature of CirrusSearch colon : character**
For an "exact phrase" search the non-spaced colon is no different from a letter or a number. If there is a space after it, the alternative without the space will not match. If this is true, then it takes well over eighteen searches to hunt for what linksto a page:
# `insource: "canonicalurl:namespace:pagename"`
# `insource: "canonicalurl namespace:pagename"`
# `insource: "canonicalurl:namespace pagename"`Here are five acceptible variations for just one of the five major forms.
# `insource: "# `[{{canonicalurl :namespace :pagename"`}}]`
# `insource: "fu# `{{canonicalurl:namespace:pagename"`}}`
# `insource: "fu# `{{canonicalurl: namespace:pagename"`}}`
# `insource: "fu# `{{canonicalurl:namespace: pagename"`}}`
# `insource: "fulurl namespace pagename"`{{canonicalurl: namespace: pagename}}`
That's twenty five different insource searches so far. Multiply that by the many, significant-difference-to-search, combinations of
* [[//mediawiki.org/wiki/help:magic words|magic words]] for namespace (three)
# `insource: "server localurl:* magic words for pagenamespace:pagename"` (about twelve)
# `insource: "* magic words for server localurl namespace:pagename"`, host, and path (many)
# `insource: "* queries on the path: server localurl:namespace pagename"`er/w/query where query is `index.php?title=` or `index.php?pageid=`
# `insource: "server localurl namespace pagename"`* parameters for magic words like `|path` or `|wiki`
# `insource:"server wiki namespace:pagename"`* "Fullurl" and "canonicalurl" also accept "urlencode" or "anchorencode" forms.
To make link-search matters worse**Foreground**
Spacing and case can be significant for insource, and a varying regexp is required for each query for matching multiple patterns. So we can search for URL in only an //ungenerous// way. For each single external link construct, Search is narrow and specific.
Each of these characteristics multiplies the number of searches required many fold:
* A namespace with two aliases adds 26 more searches* Search is camelCase sensitive, but namespace names and parser functions are not.
* You really need yet another variableInsource treats an unspaced colon : character like:this as a letter, where the non-indexed strings "like" and "this" cannot be found unless with a regex. Insource is the only option, because page visibility of the sought construct is, although possible, not likely. To find non-indexed strings, a /a regex/ just to look for the opening [ bracket needs a filter. It would need to accompany each one in its own distinctAs just explained, unique formtheir is no filter possible. Yet //still// it cEach search would not prove a closing ] bracket existed,eed its own separate regex for verification purposes. because the dot For an insource search the non-spaced colon is no different from a letter or a number. metacharacter represents any character //including a newline//.)If there is a space after it, the alternative without the space will not match.
* These many searches don't include the possibility of finding the parameterized, URL-style, internal wikilinks, where `{{anchorencode}}` and `{{urlencode}}` often come in to play the difficult, probability, "sequences" game we're forced to play.
* [[//mediawiki.org/wiki/extension:LinkSearch|Extension LinkSearch]] is heavily deprecated and basically obsolete. The [[//en.wikipedia.org/wiki/Template:Linksearch| linksearch template on Wikipedia]] hasn't been touched in yearA namespace with two aliases adds triples the number of insource searches.
* Insource doesn't take OR.
[[//en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia/List_of_errors| Wikicheck ]] mentions URL-style internal wikilinks as error 90, "Internal link written as an external link".You really need yet another variable, No bots fix them, but they are found and listed for cleanupa /regex/ just to look for the opening [ bracket. It would need to accompany each one in its own distinct, and yet templates like [[//en.wikipedia.org/wiki/Template:srlink|Srlink]] produce themunique form. Yet //still// it could not prove a closing ] bracket existed, so that for example, mirrors can back link to Wikipedia in certain casesbecause the dot **.** metacharacter represents any character //including a newline//.
* Insource doesn't take OR, but that's OK because of the query 300-char limit.
It is a series of searches few could understand. A template could try to offer to report URLs to a given canonical name of any of: a section, a fullpagename, a prefix, or a namespace. There is no closer-to-singular process available. A way for end-users to find URLs.
Whats a workaround? [[//mediawiki.org/wiki/query:api|Web API]]? [[//quarry.wmflabs.org]]?
[[//en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia/List_of_errors| Wikicheck ]] mentions URL-style internal wikilinks as error 90, "Internal link written as an external link". No bots fix them, but they are found and listed for cleanup.