DidYouMean extension submitted for comment and testing
OpenPublic

Description

Author: hippytrail

Description:
DidYouMean is designed for the English Wiktionary to automate the use of the
{{see}} template there which links articles whose titles differ only by
capitalisation, use of diacritics, spaces, hyphenation, apostrophes, etc.

It adds two metadata tables which are maintained by hooks in all places where
articles can be created, renamed, or deleted. Metadata is kept only for
non-redirects in the main namespace.

A list of links to "similar" articles is added to all articles pages in view
mode and also to the 'nogomatch' and 'noarticletext' pages.


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/Extension:DidYouMean

bzimport set Reference to bz8648.
bzimport created this task.Via LegacyJan 16 2007, 2:45 AM
bzimport added a comment.Via ConduitJan 16 2007, 2:52 AM

hippytrail wrote:

source for DidYouMean extension

source for DidYouMean extension

attachment didyoumean.tar.bzip ignored as obsolete

bzimport added a comment.Via ConduitJan 16 2007, 3:50 AM

hippytrail wrote:

DidYouMean extension diff for mainline code

Hooks for 'noarticletext' and SpecialUndelete

attachment phase3-diff.txt ignored as obsolete

bzimport added a comment.Via ConduitJan 16 2007, 3:51 AM

hippytrail wrote:

DidYouMean diff for the extension itself

The code for the extension and its installer

attachment extensions-diff.txt ignored as obsolete

bzimport added a comment.Via ConduitJan 16 2007, 5:20 AM

wikt.3.connelm wrote:

Since en.wiktionary.org (and presumably others) have [Appendix:Names] and all
those name entries, were you planning on adding any other name-oriented
normalizing to this? Or is SOUNDEX the next phase?

bzimport added a comment.Via ConduitJan 16 2007, 7:11 AM

hippytrail wrote:

Handling appendices would require parsing whole pages which is more complex than
just parsing the {{see}} template.

Soundex turned out to be a lot more promiscuous than I expected. It seemed to
only take into account the first part of the words resuling in enormous lists of
matching words for each word and not being as alike as you'd expect.

Metaphone should be better but I couldn't get the library to work in the account
you gave me.

I'd been thinkig about anagrams and textonyms next but a) they are
language-dependent, and b) they require parsing and replacing whole sections of
articles which as often as not are not in any well-defined format.

Another idea is to scan all redlinks and possibly blue links except that they
won't have canonical casing and there is no easy way to sort the wheat from the
chaff akin to ignoring redirects in article space.

bzimport added a comment.Via ConduitJan 16 2007, 7:15 AM

wikt.3.connelm wrote:

Well, I meant for the resulting main namespace entries, not taking apart the
Appendices themselves.

bzimport added a comment.Via ConduitJan 16 2007, 7:36 AM

rotemliss wrote:

Please add wikibugs-l@wikipedia.org to the CC list when you assign the bugs.

bzimport added a comment.Via ConduitJan 18 2007, 8:21 AM

robchur wrote:

First impressions are that this is quite a neat little extension and could have
great potential use. The "did you mean" message itself needs to be more
obtrusive - think coloured boxes - it's almost invisible on a search results page.

bzimport added a comment.Via ConduitJan 18 2007, 11:35 AM

hippytrail wrote:

Thanks Rob. The idea was that on the English Wiktionary it will just look like
what we've already been doing for ages without all the manual labour. Once it's
out there people should modify it to do something bigger on the search page, and
maybe not ignore redirects for Wikipedia like it does for wiktionary.

bzimport added a comment.Via ConduitJan 27 2007, 9:25 AM

hippytrail wrote:

DidYouMean extension diff for mainline code

  • Fixed return value at 'noarticletext'
  • Use new hook in SpecialUndelete instead of my own

Attached: phase3-diff.txt

bzimport added a comment.Via ConduitJan 27 2007, 9:26 AM

hippytrail wrote:

DidYouMean diff for the extension itself

  • Fix broken installer
  • Use new SpecialDelete hook instead of my own

attachment extensions-diff.txt ignored as obsolete

bzimport added a comment.Via ConduitFeb 2 2007, 2:10 AM

hippytrail wrote:

extension diff with changes suggested by Brion

Added table prefix in .sql file
Added addQuotes and tableName calls to constructed queries

attachment extensions-diff.txt ignored as obsolete

bzimport added a comment.Via ConduitFeb 9 2007, 12:56 AM

hippytrail wrote:

extension diff with changes suggested by Tim Starling

  • All functions and variables are now prefixed with wfDym-
  • The database lookup is now done inside the parser hook

attachment extensions-diff.txt ignored as obsolete

bzimport added a comment.Via ConduitFeb 9 2007, 4:49 AM

hippytrail wrote:

Fixed extension diff

Fixed a regression that slipped in.

Attached: extensions-diff.txt

brion added a comment.Via ConduitFeb 9 2007, 6:04 AM

Committed the current version to extensions in r19837 to make it a little easier
to work with updates while testing.

brion added a comment.Via ConduitMar 28 2008, 11:16 PM

A few notes on current state of the extension...

Setup:

  • Should use update hooks so the table can get installed by standard update.php
  • install.php should be replaced with a script that simply allows rebuilding the normalization entries

Caching:

  • 'see also' bits embedded into pages won't be automatically updated when the page is already cached. For cache-correctness, it'll need to look up affected pages on addition/removal of normalization entries and schedule them for purges (and, possibly, link refresh)

Internationalization:

  • It's hardcoded for particular English templates, which seems a bit icky.

In general I'm not too comfortable with the way it messes about with the text of pages as they're parsed. A totally separate 'similar pages' UI component might be cleaner. *shrug*

Peachey88 added a comment.Via ConduitApr 30 2011, 12:10 AM

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

bzimport added a comment.Via ConduitDec 23 2011, 6:00 PM

sumanah wrote:

Marking "reviewed" as the extension has been reviewed by Brion in comment 16.

bzimport added a comment.Via ConduitNov 16 2012, 10:02 PM

sumanah wrote:

I've removed DidYouMean from https://www.mediawiki.org/wiki/Review_queue until the author responds to comment 16 .

Aklapper added a comment.Via ConduitFeb 28 2014, 3:00 PM

Andrew Dunbar: Resetting the assignee and status of this issue because there has been no progress in the last years. Feel free to take it again when you are actually planning to fix this. Thanks.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.