About U+03B5 ε GREEK SMALL LETTER EPSILON (ε ε), Lua mw.text.decode(), Lua mw.ustring.gsub().
Bug report at enwiki [https://en.wikipedia.org/w/index.php?title=Module_talk:DecodeEncode#epsilon]
The issue
After resolving HTML entity ε by mw.text.decode(), the plain character is _not found_ by mw.ustring.gsub(). No issue with alternative HTML entity ε.
Report limitations
Original discovery, report and bug reproduction is at enwiki, linked in top. There :en:module:DecodeEncode and :en:module:String are used live. No Lua patterns used (no "%"). Here at phabricator pseudocode is used and "results" may be hardcoded. In-text the & escape code is used.
Steps to replicate the issue:
- 1. Create research string: Xε1Xε2X (shows live and unedited as: "Xε1Xε2X" as expected)
- 2. Render the string by mw.text.decode(), inner function
- 3. On rendered result use mw.ustring.gsub() to replace plain character "ε" with "E", outer function:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
Results
- 4. (s&r pattern use "ε" from "Xε1X"): XE1Xε2X
- 5. (s&r pattern use "ε" from "Xε2X"): XE1Xε2X
Expected
Only one character "ε" exists. I expect, all characters "ε" are equally replaced by "E": "XE1XE2X" (ok)
Workaround A ad hoc
In template code: add innermost function to _first_ replace in the research string "ε" into "ε"
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s={{#invoke:String|replace|source=Xε1Xε2X|pattern=ε|replace=ε|plain=true}}}}|pattern=ε|replace=E|plain=true}}<
Result: "XE1XE2X" (ok)
Workaround B in module (THIN SPACE example)
Plan: early in the :en:module:DecodeEncode function, replace bad "ε" with good "ε"
Current and proposed module/sandbox code at [https://en.wikipedia.org/wiki/Module_talk:DecodeEncode#Workaround_B]
About THIN SPACE: it looks like character U+2009 THIN SPACE (   ) has a similar issue.
Current live module code is addressing this:
s = mw.ustring.gsub( s, ' ', ' ' )
In the module/sandbox, I have added similar Lua code for epsilon:
s = mw.ustring.gsub( s, 'ε', 'ε' )
- /sandbox tests:
{{#invoke:String|replace|source={{#invoke:DecodeEncode/sandbox|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}<
Result B-1 (s&r pattern use ε from <code>Xε1X</code>): "XE1XE2X" (ok)
Result B-2 (s&r pattern use ε from <code>Xε2X</code>): "XE1XE2X" (ok)
This appears to solve the issue.
Workaround C in mw, Lua
Changes in mw, Lua: out of my league.