The English language has very simple morphology, and this makes it relatively easy to build search engines that can find different forms of a word with no effort from the end user.
Many other languages have a complex morphology with declinations, conjugations, clitics, agglutination, etc. Some search engines can plug in morphology and stemming support for particular languages. Support for each language must be developed and maintained independently.
When Wikidata's Lexical Data is able to create all declined forms of a word, the output can be reused in both ways to build stemming engines: To find the base form (or forms) of a word from a declined form, and to find the declined forms from a base form. Wikibase should provide APIs that make such usage as easy as possible.
Notes:
- Like other subtasks of T186421, this is not a particular bug, but an idea for how Lexical Data can be useful in the long term. I am filing it in the hope that knowing the possible user scenarios will be useful to Wikibase developers when they are making decisions about developing the infrastructure, and to Wikidata community members when they are proposing properties, developing bots, and so on.
- This is comparable to T186429 and T186420, but for search engines.
- I'm subscribing @TJones and @Smalyshev, who know far more about stemming engines than I do.