We should investigate this. This is clearly not a sensible suggestion for the query that the user gave.
- Mentioned In
- rECIR4ade26531726: Add more phrase suggester options
rMW8a15494b9213: Updated mediawiki/core Project: mediawiki/extensions/CirrusSearch…
rECIRac655489ac5e: Add more phrase suggester options
rMEXT68ad4ac29b0f: Updated mediawiki/extensions Project: mediawiki/extensions/CirrusSearch…
T105202: If the user gets zero results, but gets a "Did you mean" result, just run the query for the "Did you mean" result and inform the user that this happened
- Mentioned Here
- T105202: If the user gets zero results, but gets a "Did you mean" result, just run the query for the "Did you mean" result and inform the user that this happened
I ran some tests on low content wiki (wikimania2015, meta...) and it appears that on small wikis the options we use to compute suggestions are not well suited for low frequencies.
In order to decrease the number of bad suggestions on this kind of wiki we need to add more constraint to the suggester.
This is possible by adding more options to cirrus.
Technical details :
- We can use the collate option: exclude suggestions that do not match a specific query.
- wikimania2015: poop more from post mortem would be excluded because it does not match any title/redirect
- meta: director democracy from direct democracy would be also excluded
- This will drastically reduce the number of suggestions but we can play with query options : (match 66% of the suggested words)
- But in this case Request for comment on direct democracy will suggest Request for comment on director democracy
- We can use more smoothing models
- The stupid_backoff model used today seems inappropriate in some cases
- By using the laplace model the results are a bit better (more strict)
|wiki||user query||today||collate (66%)||collate (66%) + laplace (⍺=0.3)|
|wikimania2015||post mortem||poop more|
|wikimania2015||wiki manio||wiki main||wiki main|
|wikimania2015||the future of post mortem||the future of poop more||the future of poop more|
|wikimania2015||the future of wiku discusson||the future of wiki discussions||the future of wiki discussions||the future of wiki discussions|
|wikimania2015||gender inequolity index||gender inequality index||gender inequality index||gender inequality index|
|wikimania2015||gender inequolity||gender inequality||gender inequality||gender inequality|
|wikimania2015||gender inquolity||gender inequality||gender inequality|
The collate (66%) + laplace(⍺=0.3) configuration seems to perform better by providing less funny suggestions, the drawbacks is it refuses good suggestions that were suggested by other models like gender inquolity -> gender inequality (2 typos).
These options are not in cirrus today, should I create another task to make these configurations and models available?