Introduce Special:Suffixindex
OpenPublic

Description

Author: eep

Description:
Is this possible? Some requests/discussions about it (and the problems caused by NOT having it):

http://meta.wikimedia.org/wiki/MediaWiki_Ideal#Suffixindex
http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_%28disambiguation_pages%29/Archive_29#All_pages_beginning_with...

Of course, wildcard search (such as implemented in http://www.mediawiki.org/wiki/Extension:Wildcard_search ) would probably work too, but it would be nice if MediaWiki had that functionality by default.


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Wikipedia_talk:Special:PrefixIndex#SuffixIndex

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz10808.
bzimport created this task.Via LegacyAug 5 2007, 10:07 PM
brion added a comment.Via ConduitAug 8 2007, 5:40 PM

No DB support for doing this efficiently.

bzimport added a comment.Via ConduitAug 8 2007, 5:42 PM

eep wrote:

Well, how about changing the DB then? :)

bzimport added a comment.Via ConduitSep 4 2007, 11:22 AM

eep wrote:

Using Dynamic Page List (http://www.mediawiki.org/wiki/Extension:DynamicPageList ), a simple single-line query can do this (and MUCH more): {{#dpl:namespace=|titlematch=%{{PAGENAME}}}}

Why can't MediaWiki do this by default? Doesn't seem like it would be that hard to do...

brion added a comment.Via ConduitSep 4 2007, 7:12 PM

Because it's very expensive, requiring a scan of up to several million page entries. It's only efficient on a very small wiki.

bzimport added a comment.Via ConduitSep 4 2007, 9:02 PM

eep wrote:

I just don't get how a prefix index can somehow be magically OK yet a suffix index is soo expensive? Limit the results to x per query or something if it's THAT expensive. There are most definitely (and certainly) ways of limiting database queries so they're not so "expensive"...

brion added a comment.Via ConduitSep 5 2007, 2:26 PM

A prefix index is just an index. Because of the way indexes work in the database, doing an exact-prefix match comes "for free".

We didn't have to do *anything* to get the prefix index. It was already there.

Limiting results would not have any useful effect -- to be efficient, you have to limit the amount of data you search through. And that requires having a sensible index for it.

To do a _suffix_ index, we'd have to:

  1. Create a second indexed field with a *reversed* copy of the content
  1. Populate that field throughout the entire database -- an expensive one-time setup operation which will require a few hours' work, careful coordination, and some downtime for the site.
  1. Keep that field around forever.

While it's possible to do it, there's not a lot of interest in it and it's a minor feature at best for a big change, so you'll probably not see people tripping over themselves to implement it right away.

bzimport added a comment.Via ConduitSep 7 2007, 12:20 PM

eep wrote:

On the way to a proper and decent search engine then?

bzimport added a comment.Via ConduitSep 19 2007, 11:55 AM

robchur wrote:

*** Bug 11395 has been marked as a duplicate of this bug. ***

MZMcBride added a comment.Via ConduitApr 20 2009, 1:35 AM

Re-opening this. There are legitimate use cases for this (like Wiktionary, cf. bug 17999). Nothing's really holding this up except a developer willing to implement it (i.e., no reason to leave this as LATER).

bzimport added a comment.Via ConduitDec 16 2009, 11:19 AM

public.wiki wrote:

*** Bug 21864 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitDec 16 2009, 11:20 AM

public.wiki wrote:

*** Bug 21864 has been marked as a duplicate of this bug. ***

Jidanni added a comment.Via ConduitNov 20 2010, 8:50 AM

From an outsider's point of view it looks like an oversight to only have [[Special:PrefixIndex]].

On it you could put mention the reason for no complimentary [[Special:SuffixIndex]] there for people to see...

Tons of times when I wanted to use it...

"Let's find all the "... Agency"s on my wiki", etc.

bzimport added a comment.Via ConduitNov 20 2010, 11:41 PM

bugs wrote:

(In reply to comment #12)

"Let's find all the "... Agency"s on my wiki", etc.

...or "pages in Chinese (PAGE/zh)". :-)

MZMcBride added a comment.Via ConduitMay 17 2011, 6:50 PM

I wonder if the page_props table could be used for this. The current schema has the following unique index:

CREATE UNIQUE INDEX /*i*/pp_page_propname ON /*_*/page_props (pp_page,pp_propname);

If a second index were added on (pp_propname, pp_value), you could store the reversed page title with a "suffixindex" key fairly easily, couldn't you? Except that pp_value is a blob... bleh.

Nemo_bis added a comment.Via ConduitJul 14 2011, 11:19 AM

Because someone (like me) could look for this feature here, I leave here a link to a Toolserver tool that can be used to get such titles (thank you MZMcBride), for instance http://toolserver.org/~mzmcbride/yanker/?db=enwiktionary_p&list=pages&category=&page=&pattern=.%2Bps%24&namespace=0&limit=5000&sort=&wrapper=%241

Bawolff added a comment.Via ConduitSep 11 2011, 4:02 AM

So say we want a list of All things ending with "ed" - would you want that list in alphabetical order? (If the answer is yes this would be really difficult to do, if no then less difficult). Since if you want it in alphabetical order we'd basically need an index sorting it in both directions since first we jump to everything ending in "ed", and then we need to sort from the other end (normal direction) of the word. Otherwise in our suffix search (for say the suffix "ed") we'd have sorting like "yielded, zigzagged, recovered, edited, amortized" (Note how the third last letter of each word is alphabetically ordered)

I suppose one could create a table that for each page name had a partially reversed page name, and to what point its reversed, for all possible points-

For example for the page Fred (with page_id 1) we'd have:

page_id | page_reversed_point | page_reversed
--------+---------------------+--------------
1 | 1 | dFre
1 | 2 | edFr
1 | 3 | redF
1 | 4 | Fred
--------+---------------------+--------------

So each page would have up to 255 entries depending on how long the title is. (because the user could search for a suffix between 1 to 255 letters long). However, having such a table would be INSANE ;)

Thus unless SuffixIndex would be useful with results in non-alphabetical order (which seems unlikely) this should be wontfixed as unfeasible imo.

Nemo_bis added a comment.Via ConduitSep 11 2011, 8:55 AM

(In reply to comment #16)

Thus unless SuffixIndex would be useful with results in non-alphabetical order
(which seems unlikely) this should be wontfixed as unfeasible imo.

Alphabetical order doesn't seem necessary at all to me.

MZMcBride added a comment.Via ConduitSep 11 2011, 7:11 PM

(In reply to comment #16)

Thus unless SuffixIndex would be useful with results in non-alphabetical order
(which seems unlikely) this should be wontfixed as unfeasible imo.

I don't think it's that infeasible to add a page.page_title_reversed column and index it (possibly in a page_title_reversed,page_title pair).

Bawolff added a comment.Via ConduitSep 11 2011, 7:23 PM

(In reply to comment #18)

(In reply to comment #16)
> Thus unless SuffixIndex would be useful with results in non-alphabetical order
> (which seems unlikely) this should be wontfixed as unfeasible imo.

I don't think it's that infeasible to add a page.page_title_reversed column and
index it (possibly in a page_title_reversed,page_title pair).

My point was that would result in a non-alphabeticaly ordered suffix list. If that's ok then it is potentially feasible to do, if a non-alphabetical list is not acceptable, it would not be possible to reasonably do this (As far as I can tell).

Jidanni added a comment.Via ConduitSep 14 2011, 1:38 AM

Just add an SQL "ORDER BY" clause at the end.

MaxSem added a comment.Via ConduitSep 14 2011, 10:02 AM

(In reply to comment #20)

Just add an SQL "ORDER BY" clause at the end.

And it will be sorted by reverse title. Not quite useful.

ChongDae added a comment.Via ConduitNov 23 2012, 2:52 AM

*** Bug 2168 has been marked as a duplicate of this bug. ***

JanusTroelsen added a comment.Via ConduitAug 27 2013, 12:58 AM

There is at least some DB support for doing this efficiently.

In PostgreSQL, since 9.1, there is a built-in function for reversing strings: http://www.postgresql.org/docs/9.2/static/functions-string.html

In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)

It is also possible to build indexes on expressions at least since 7.4: http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't find any older documentation)

Bawolff added a comment.Via ConduitAug 27 2013, 2:02 AM

(In reply to comment #23)

There is at least some DB support for doing this efficiently.

In PostgreSQL, since 9.1, there is a built-in function for reversing strings:
http://www.postgresql.org/docs/9.2/static/functions-string.html

In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)

It is also possible to build indexes on expressions at least since 7.4:
http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't
find any older documentation)

We use mysql (technically mariadb) not postgress

Nemo_bis added a comment.Via ConduitAug 27 2013, 9:25 AM

(In reply to comment #24)

We use mysql (technically mariadb) not postgress

It would be nice if PostgreSQL support also had some things *more* in MediaWiki, isn't it possible to add a feature depending on the used DB?

Bawolff added a comment.Via ConduitAug 27 2013, 4:46 PM

(In reply to comment #25)

(In reply to comment #24)
> We use mysql (technically mariadb) not postgress

It would be nice if PostgreSQL support also had some things *more* in
MediaWiki, isn't it possible to add a feature depending on the used DB?

Its possible, but something we don't generally do (in core anyways. Extensions can do what they want).

Typically our db compatibility policy is that it must work on mysql, as mysql is our primary supported db platform.


Adding a fancy index like that in postgress, is probably not that much different from adding an extra field in mysql with the title reversed.

Liuxinyu970226 removed a subscriber: Liuxinyu970226.Via WebNov 24 2014, 1:31 AM
Liuxinyu970226 removed a subscriber: Liuxinyu970226.Via WebDec 9 2014, 8:32 AM
Tgr added a project: CirrusSearch.Via WebApr 19 2015, 6:26 PM
Tgr added a subscriber: Tgr.

This would make more sense as a CirrusSearch feature request. ElasticSearch supports wildcards at any position in the search term, and CirrusSearch already supports them, so the only step missing is correctly interpreting combined conditions such as intitle:*foo, as far as I can see.

Add Comment