English-language wikis use `aggressive_splitting`, which is a language analysis filter (a version of Elasticsearch's [[ https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html#analysis-word-delimiter-tokenfilter | Word Delimiter Token Filter ]]) that splits words on case changes (as was the original issue in this ticket) and in other circumstances. Investigate applying it everywhere, or at least for many more languages.
Original task title & description:
**Cross-wiki search tokenizer is better than local search one**
[[https://fr.wikipedia.org/wiki/Sp%C3%A9cial:Recherche?search=FilesystemHierarchyStandard&sourceid=Mozilla-search | Searching for “FilesystemHierarchyStandard” in fr.wp]] give me no local result but several results from en.wp, including [en:Filesystem Hierarchy Standard] whereas equivalent [fr:Filesystem Hierarchy Standard] exists.
I’ve already encountered this strange issue: global search is sometimes better than local search, especially in phrase tokenization (when I missed spaces).
Maybe it’s because I use an English phrasing on French wiki?