Change Details

After talking with @dcausse, we decided that having two custom analyzers set up (stemmed & non-stemmed one) for every language in descriptions is wasteful, since not all of them are useful for Wikibase use case. We'd want to only make stemmed ones for those languages, and use the plain (non-stemmed) analyzer for others. Here is the list of languages for which we have "non-trivial" configuration for stemming (`text`) analyzer: ``` ar bg ca ckb cs da de el en en-ca en-gb es eu fa fi fr ga gl hi hu hy id it ja ko lt lv nb nl nn pt pt-br ro ru simple sv th tr ``` That includes having named analyzer types (e.g. 'bulgarian') and specialized filters or tokenizers. Note that we are only concerned about whether the `text` analyzer we have will have additional value as compared to `plain` analyzer, since we're keeping `plain` one anyway, and only in the context of common Wikibase/Wikidata usage on descriptions.