Page MenuHomePhabricator

Investigate why some suboptimal suggestions are provided
Closed, ResolvedPublic

Description

I have some doubts about whether this is a good idea.

We should investigate this. This is clearly not a sensible suggestion for the query that the user gave.

Event Timeline

Deskana raised the priority of this task from to Medium.
Deskana updated the task description. (Show Details)
Deskana added subscribers: Deskana, matmarex.

To be fair, it's quite rare for the "did you mean" to be that funny. My favourite is still https://twitter.com/nemobis/status/487929021850996736

@Nemo_bis which wiki is this from?

wikimania2015wiki.

I ran some tests on low content wiki (wikimania2015, meta...) and it appears that on small wikis the options we use to compute suggestions are not well suited for low frequencies.
In order to decrease the number of bad suggestions on this kind of wiki we need to add more constraint to the suggester.
This is possible by adding more options to cirrus.

Technical details :

  • We can use the collate option: exclude suggestions that do not match a specific query.
    • wikimania2015: poop more from post mortem would be excluded because it does not match any title/redirect
    • meta: director democracy from direct democracy would be also excluded
    • This will drastically reduce the number of suggestions but we can play with query options : (match 66% of the suggested words)
    • But in this case Request for comment on direct democracy will suggest Request for comment on director democracy
  • We can use more smoothing models
    • The stupid_backoff model used today seems inappropriate in some cases
    • By using the laplace model the results are a bit better (more strict)
wikiuser querytodaycollate (66%)collate (66%) + laplace (⍺=0.3)
wikimania2015post mortempoop more
wikimania2015wiki maniowiki mainwiki main
wikimania2015wikimaniowikimaniawikimaniawikimania
wikimania2015the future of post mortemthe future of poop morethe future of poop more
wikimania2015the future of wiku discussonthe future of wiki discussionsthe future of wiki discussionsthe future of wiki discussions
wikimania2015gender inequolity indexgender inequality indexgender inequality indexgender inequality index
wikimania2015gender inequolitygender inequalitygender inequalitygender inequality
wikimania2015gender inquolitygender inequalitygender inequality

The collate (66%) + laplace(⍺=0.3) configuration seems to perform better by providing less funny suggestions, the drawbacks is it refuses good suggestions that were suggested by other models like gender inquolity -> gender inequality (2 typos).

These options are not in cirrus today, should I create another task to make these configurations and models available?

Change 228831 had a related patch set uploaded (by DCausse):
Add more phrase suggester options

https://gerrit.wikimedia.org/r/228831

Change 229437 had a related patch set uploaded (by EBernhardson):
Add more phrase suggester options

https://gerrit.wikimedia.org/r/229437

Change 228831 merged by jenkins-bot:
Add more phrase suggester options

https://gerrit.wikimedia.org/r/228831

Change 229437 merged by jenkins-bot:
Add more phrase suggester options

https://gerrit.wikimedia.org/r/229437