On Wikipedia, km² is impossible to findtarget in a search, as is mm³.
For unicode, **regex can find one-character or two-character strings only.**
With unicode digits **regex can find one-character or two-character strings only.**
To see this without running bare regex on millions of pages,
[Here's 10k pp with 250 unicode hits. Add your own chars 'til it fails.](https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&search=insource:/²|³/+prefix:Che&fulltext=Search)
On Goggle [[//www.google.com/search?q="mm³"+site:en.wikipedia.org | "mm³" ]] gives 71 results, finding mm³.
T41501 says unicode quotes are not normalized,
and this one says ² and ³ are not normalized.
But //digits are indexed// and quotes are not.
T95849 considers analyzers, filtering, and fields, and shows enwiki
page mapping properties while troubleshooting the unicode ★ character.
The current analyzer for the //match highlighter// works correctly "finding"But the black star [is found in regex strings,](//en.wikipedia.org/w/index.php?search=insource:/"{{Unicode|★}}+||+U%2B2605"/+prefix:Miscellaneous&title=Special:Search&go=Go)
and other unicode characters are also found in regex strings.
unicode charactersThe //highlighter analyzer// works correctly "finding"
unicode digits in all manner of strings. For example,
[see `insource:/²|³|km²/ prefix:Che`](https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&search=insource:/²|³|km²/+prefix:Che&fulltext=Search). Km² is "found" by the **highlighter**,
but when you remove the //actual// matches (single unicode strings) `²|³`... nothing.
The current analyzer for the type-ahead search also works
with strings greater than one or two, for example ♥ or ★, or m or mm with ² or ³//typeahead analyzer// works fine for or mm³ or km².
Summary concerning the most basic search for unicode digits:
- `"mm3"` or `"km2"` find no normalized ² or ³ character in the index.
- `"mm³"` or `"m²"` find only mm or m (because these digits are treated as punctuation?)
- `insource:/mm³/` or `insource:/km²/` find nothing because they're greater than two chars.
- These problems do not exist for other unicode characters, just digits.