Fri, Aug 16
LPeg would be great because it could make many string-related tasks easier, though it has a steep learning curve and I am not sure if it would use more or less memory and processing time than the less sophisticated methods that we use now. It might not be possible to cache LPeg patterns, because they can contain references to arbitrary Lua values, including functions (which are prohibited in modules loaded with mw.loadData because they can cause T67258: Information can be passed between #invoke's (tracking)), in which case patterns would have to be newly generated for each module invocation.
Thu, Aug 15
However, we are in the process of upgrading to PHP7, which uses a newer version of Unicode that may include case mappings for the characters you're concerned about.
Sat, Jul 27
The two bug locations that I've reported here are 1. in Scribunto (the functions mw.ustring.upper and mw.ustring.lower) and 2. in whatever generates the headers in category pages. At least the Scribunto bug involves PHP because mw.ustring.upper and mw.ustring.lower seem to be implemented using the PHP functions mb_strtoupper and mb_strtolower, and categories probably involve PHP as well. If the title and tags need edits, I would appreciate some help as I don't have much energy right now and it is possible I am not doing Phabricator right.
Jul 25 2019
Jun 28 2019
May 9 2019
I am seeing what might be the same error on the English Wiktionary. The diff for what is currently the latest revision of MediaWiki:Common.css shows the error [XNPAewpAAEMAADungXAAAAAS] 2019-05-09 05:54:03: Fatal exception of type "Wikimedia\Assert\ParameterTypeException". I'm the author of both revisions and my username is not invalid, so this seems not to be an example of T200055.
Apr 15 2019
Yes, plugging pure-Lua ustring into my example does work:
Apr 12 2019
Jan 5 2019
On English Wiktionary, this extension could be used to transclude a template at the top of pages in the Reconstruction namespace. At the moment, we have to manually add the template to every page. See for instance Reconstruction:Proto-Indo-European/dn̥ǵʰwéh₂s.
Oct 22 2018
Oct 3 2018
This feature would be useful, but using language codes would be complex.
Jul 1 2018
May 3 2018
Update regarding topic category structure, as it is no longer true, as the original post states, that Category:Fruits contains English words related to fruit. The category structure has changed so that Category:Topicname is an umbrella category that only contains categories: language-specific categories prefixed by a language code (Category:languagecode:Topicname) as well as other umbrella categories (Category:Subtopicname). Here, replace "Topicname", "topicname", and "Subtopicname" with things like "Fruits", "fruits" and "Apple cultivars". To use the example in the original post, now Category:Fruits only contains categories such as Category:en:Fruits, Category:fr:Fruits, Category:Apple cultivars, Category:Banana cultivars. English words related to fruit are now found in Category:en:Fruits.