Page MenuHomePhabricator

Search engine does not find "ai dû" on the French Wiktionary.
Closed, ResolvedPublic

Description

If you type "ai dû" or "j'ai dû" on the French Wiktionary by filtering to keep only Appendix pages (https://fr.wiktionary.org/w/index.php?title=Sp%C3%A9cial:Recherche&profile=advanced&profile=advanced&fulltext=Search&search=ai+d%C3%BB&searchengineselect=mediawiki&ns100=1&searchToken=12uikwjsbh1rky0xeaqyc0mc9), the search engine should propose
"Annexe:Conjugaison en français/devoir". However, this result does not appears. It is only proposed when one types "j’ai dû" (with the typographic apostrophe). Could you improve results returned by the search engine when part of a word is typed ("ai dû") and could you take into account the fact that "'" (U+0027) and "’" (U+2019) should be consider as the same character?

Event Timeline

Pamputt created this task.Nov 20 2016, 10:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2016, 10:37 PM
Aklapper edited projects, added CirrusSearch; removed Wiktionary.Nov 21 2016, 11:08 AM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptNov 21 2016, 11:08 AM

PS: the community would accept a collation modification to solve this kind of problem.

@dcausse and @TJones will take a look at this on Wednesday and discuss what can be done about it.

Deskana triaged this task as Normal priority.Nov 21 2016, 5:47 PM
Deskana moved this task from needs triage to Current work on the Discovery-Search board.

I had a look at how we handle french elisions and apostrophes for french.
I think we already support the je ai => j'ai elision and folding of U+2019 into U+0027. Unfortunately it's not easily accessible.

To clarify, using double quotes in the search query force the search engine to use a field where no stemming and few language analysis are done.

  • "ai dû" is unlikely to work well because french elisions are not in place on this field.

But you can tell the search engine to use the field that has all the language specific analysis enabled by adding a ~ after the last double quote (not to be confused with fuzzy matching)

  • "ai dû"~ will probably find all instances of j'ai dû (ASCII apostrophe), j’ai dû (U+2019), ai dû (no elision, j'ai fait [...] et ai dû)

@Pamputt could you confirm that the behavior of the query "ai dû"~ is closer to what you expect?

dcausse added a comment.EditedNov 23 2016, 10:36 AM

Sorry I just realized that the link you pasted to your search query does not include quotes, the search query is ai dû. This query returns only 7 results and it's completely wrong.
This issue is not trivial to explain but it's due to some internal components we use and should be fixed when we rollout BM25 and its new query builder to all wikis.
@Pamputt (just to be sure), can you confirm that the results displayed here:

are more like what you expect (Annexe:Conjugaison en français/devoir is the first result)?
The link I pasted includes an hidden parameter (&cirrusFTQBProfile=perfield_builder) and allows me to experiment with the new feature we plan to activate.

Unrelated to this particular issue but I can see that we have some highlighting issues, j'ai is not always highlighted.

Indeed, both links you provide look to work corretly. At least the y work much better than the current behaviour. Except highlighting that is sometimes missing, it is good.

dcausse changed the task status from Open to Stalled.Nov 24 2016, 9:54 AM

Thanks, I'll mark this task as blocked by T139585. This problem should resolve itself when the reindexing is done (hopefully before the end of the year).
Concerning the highlighting issue, I don't have time to explore the problem (highlighting issues are never trivials), feel free to open another task.

We've started the reindexing, but I don't think we've got the French Wiktionary yet. Hopefully this should be fixed within a few weeks.

dcausse closed this task as Resolved.Dec 12 2016, 7:09 PM

fr.wiktionary.org is now reindexed with the new settings, this is still not perfect but Annexe:Conjugaison en français/devoir is now the 3rd result for the queries mentioned in this ticket. Feel free to reopen if you think it's not properly fixed.