Scribunto needs sane Unicode string support
Closed, ResolvedPublic

Description

Scribunto's built-in string module works with bytestrings. So if you have something like "string.len('hüllo')", it will return 6. If you have something like "string.reverse('hüllo')", it will return "oll��h".

This is fine for a programming language, I guess, but particularly for a case like Scribunto (where template programmers are being targeted and there's Unicode everywhere), sane Unicode string handling _must_ come with the extension.

Victor Vasiliev has done some work on this already, I'm told, as a ustring module. There's a C part and a Lua part. I've no idea where the code is, but I'm told it's publicly available somewhere.


Version: unspecified
Severity: enhancement
URL: https://www.mediawiki.org/wiki/Extension:Scribunto/API_specification#ustring_API

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz39646.
MZMcBride created this task.Via LegacyAug 25 2012, 4:36 PM
vvv added a comment.Via ConduitAug 25 2012, 5:56 PM

C code is in SVN, right in luasandbox module. Lua code is in gerrit, but it needs more fixes.

MZMcBride added a comment.Via ConduitAug 26 2012, 2:35 AM

Two points:

(1) http://scribunto.wmflabs.org has some version of a ustring module right now. Not sure how or why, though its function names are painfully abbreviated.

(2) Fran McCrory makes some very interesting points at https://www.mediawiki.org/w/index.php?diff=575869&oldid=575863 about using u'foo' syntax and whether it might make sense to do away with bytestrings altogether.

tstarling added a comment.Via ConduitAug 26 2012, 10:05 AM

The code Victor wrote had a completely different API to the stock Lua string functions, and it wasn't possible to simulate it in pure Lua. So I disabled it before I deployed it. It's better for the functionality to be temporarily missing than to be stuck with a bad interface forever.

MZMcBride added a comment.Via ConduitAug 26 2012, 1:04 PM

(In reply to comment #4)

The code Victor wrote had a completely different API to the stock Lua string
functions, and it wasn't possible to simulate it in pure Lua. So I disabled it
before I deployed it. It's better for the functionality to be temporarily
missing than to be stuck with a bad interface forever.

Thank you for explaining. That's fine and I completely agree. But it would saved me a ton of confusion if this had been made clearer (cf. bug 39655).

Anomie added a comment.Via ConduitFeb 19 2013, 3:40 AM

We have mw.ustring now.

mxn added a subscriber: mxn.Via WebNov 24 2014, 8:56 PM
Ricordisamoa added a subscriber: Ricordisamoa.Via WebJan 1 2015, 11:01 AM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.