If you type "ai dû" or "j'ai dû" on the French Wiktionary by filtering to keep only Appendix pages (https://fr.wiktionary.org/w/index.php?title=Sp%C3%A9cial:Recherche&profile=advanced&profile=advanced&fulltext=Search&search=ai+d%C3%BB&searchengineselect=mediawiki&ns100=1&searchToken=12uikwjsbh1rky0xeaqyc0mc9), the search engine should propose
"Annexe:Conjugaison en français/devoir". However, this result does not appears. It is only proposed when one types "j’ai dû" (with the typographic apostrophe). Could you improve results returned by the search engine when part of a word is typed ("ai dû") and could you take into account the fact that "'" (U+0027) and "’" (U+2019) should be consider as the same character?
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | dcausse | T151173 Search engine does not find "ai dû" on the French Wiktionary. | |||
Resolved | EBernhardson | T139585 Reindex all wikis with the new BM25 settings | |||
Resolved | dcausse | T152092 Activate BM25 on all but wikis with spaceless languages |
Event Timeline
PS: the community would accept a collation modification to solve this kind of problem.
I had a look at how we handle french elisions and apostrophes for french.
I think we already support the je ai => j'ai elision and folding of U+2019 into U+0027. Unfortunately it's not easily accessible.
To clarify, using double quotes in the search query force the search engine to use a field where no stemming and few language analysis are done.
- "ai dû" is unlikely to work well because french elisions are not in place on this field.
But you can tell the search engine to use the field that has all the language specific analysis enabled by adding a ~ after the last double quote (not to be confused with fuzzy matching)
- "ai dû"~ will probably find all instances of j'ai dû (ASCII apostrophe), j’ai dû (U+2019), ai dû (no elision, j'ai fait [...] et ai dû)
@Pamputt could you confirm that the behavior of the query "ai dû"~ is closer to what you expect?
Sorry I just realized that the link you pasted to your search query does not include quotes, the search query is ai dû. This query returns only 7 results and it's completely wrong.
This issue is not trivial to explain but it's due to some internal components we use and should be fixed when we rollout BM25 and its new query builder to all wikis.
@Pamputt (just to be sure), can you confirm that the results displayed here:
are more like what you expect (Annexe:Conjugaison en français/devoir is the first result)?
The link I pasted includes an hidden parameter (&cirrusFTQBProfile=perfield_builder) and allows me to experiment with the new feature we plan to activate.
Unrelated to this particular issue but I can see that we have some highlighting issues, j'ai is not always highlighted.
Indeed, both links you provide look to work corretly. At least the y work much better than the current behaviour. Except highlighting that is sometimes missing, it is good.
Thanks, I'll mark this task as blocked by T139585. This problem should resolve itself when the reindexing is done (hopefully before the end of the year).
Concerning the highlighting issue, I don't have time to explore the problem (highlighting issues are never trivials), feel free to open another task.
We've started the reindexing, but I don't think we've got the French Wiktionary yet. Hopefully this should be fixed within a few weeks.
fr.wiktionary.org is now reindexed with the new settings, this is still not perfect but Annexe:Conjugaison en français/devoir is now the 3rd result for the queries mentioned in this ticket. Feel free to reopen if you think it's not properly fixed.