Page MenuHomePhabricator

Investigate why some suboptimal suggestions are provided
Closed, ResolvedPublic

Description

I have some doubts about whether this is a good idea.

We should investigate this. This is clearly not a sensible suggestion for the query that the user gave.

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : wmf/1.26wmf17Add more phrase suggester options
mediawiki/extensions/CirrusSearch : masterAdd more phrase suggester options

Event Timeline

Deskana created this task.Jul 27 2015, 9:25 PM
Deskana raised the priority of this task from to Medium.
Deskana updated the task description. (Show Details)
Deskana added subscribers: Deskana, matmarex.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 27 2015, 9:25 PM

@Nemo_bis which wiki is this from?

To be fair, it's quite rare for the "did you mean" to be that funny. My favourite is still https://twitter.com/nemobis/status/487929021850996736

@Nemo_bis which wiki is this from?

wikimania2015wiki.

dcausse claimed this task.Jul 30 2015, 2:32 PM
dcausse set Security to None.

I ran some tests on low content wiki (wikimania2015, meta...) and it appears that on small wikis the options we use to compute suggestions are not well suited for low frequencies.
In order to decrease the number of bad suggestions on this kind of wiki we need to add more constraint to the suggester.
This is possible by adding more options to cirrus.

Technical details :

  • We can use the collate option: exclude suggestions that do not match a specific query.
    • wikimania2015: poop more from post mortem would be excluded because it does not match any title/redirect
    • meta: director democracy from direct democracy would be also excluded
    • This will drastically reduce the number of suggestions but we can play with query options : (match 66% of the suggested words)
    • But in this case Request for comment on direct democracy will suggest Request for comment on director democracy
  • We can use more smoothing models
    • The stupid_backoff model used today seems inappropriate in some cases
    • By using the laplace model the results are a bit better (more strict)
wikiuser querytodaycollate (66%)collate (66%) + laplace (⍺=0.3)
wikimania2015post mortempoop more
wikimania2015wiki maniowiki mainwiki main
wikimania2015wikimaniowikimaniawikimaniawikimania
wikimania2015the future of post mortemthe future of poop morethe future of poop more
wikimania2015the future of wiku discussonthe future of wiki discussionsthe future of wiki discussionsthe future of wiki discussions
wikimania2015gender inequolity indexgender inequality indexgender inequality indexgender inequality index
wikimania2015gender inequolitygender inequalitygender inequalitygender inequality
wikimania2015gender inquolitygender inequalitygender inequality

The collate (66%) + laplace(⍺=0.3) configuration seems to perform better by providing less funny suggestions, the drawbacks is it refuses good suggestions that were suggested by other models like gender inquolity -> gender inequality (2 typos).

These options are not in cirrus today, should I create another task to make these configurations and models available?

Change 228831 had a related patch set uploaded (by DCausse):
Add more phrase suggester options

https://gerrit.wikimedia.org/r/228831

Ironholds moved this task from Needs triage to Search on the Discovery board.Aug 4 2015, 8:16 AM

Change 229437 had a related patch set uploaded (by EBernhardson):
Add more phrase suggester options

https://gerrit.wikimedia.org/r/229437

Change 228831 merged by jenkins-bot:
Add more phrase suggester options

https://gerrit.wikimedia.org/r/228831

Change 229437 merged by jenkins-bot:
Add more phrase suggester options

https://gerrit.wikimedia.org/r/229437

ksmith moved this task from Search to On Sprint Board on the Discovery board.Aug 27 2015, 8:30 PM
ksmith moved this task from On Sprint Board to Search on the Discovery board.Sep 10 2015, 8:14 PM
ksmith moved this task from Search to On Sprint Board on the Discovery board.
Deskana closed this task as Resolved.Sep 12 2015, 2:36 AM
Deskana moved this task from Done to Resolved on the Discovery-Search (Current work) board.