On Wikipedia, km² is impossible to target in a search, yet Goggle reports "km²" on well over 120,000 pages.
But Unicode digits
- have not been normalized. Basic search "mm3" or "km2" find no normalized ² or ³ character in the index.
- are treated like punctuation. Basic search "mm³" finds mm.
- fail in regex strings greater than two chars. /mm³/ or /km²/ are missing out.
Major templates such as Convert and Val supports unicode digits in either form km² or km2. In mainspace, 5% of pages who use <sup>2 also use ².
Confusingly, km² is recognized by the highlighter, but when you remove the actual matches (single unicode strings) ²|³... nothing.
For example, see insource:/²|³|km²/ prefix:Chem. Also the typeahead analyzer works fine for or mm³ or km².
To see how two is ok but three fails, and without running bare regex on millions of pages, here's a small domain with some /²|³/ hits.
T41501 says unicode quotes are not normalized, and this one says ² and ³ are not normalized. But digits are indexed and quotes are not.
T95849 considers analyzers, filtering, and fields, and shows enwiki page mapping properties while troubleshooting the unicode ★ character.
But the black star, although not found in indexed searches, is not impossible to find using regex,
and other unicode characters are also found in regex strings.