Scribunto needs sane Unicode string support
Closed, ResolvedPublic

Description

Scribunto's built-in string module works with bytestrings. So if you have something like "string.len('hüllo')", it will return 6. If you have something like "string.reverse('hüllo')", it will return "oll��h".

This is fine for a programming language, I guess, but particularly for a case like Scribunto (where template programmers are being targeted and there's Unicode everywhere), sane Unicode string handling _must_ come with the extension.

Victor Vasiliev has done some work on this already, I'm told, as a ustring module. There's a C part and a Lua part. I've no idea where the code is, but I'm told it's publicly available somewhere.


Version: unspecified
Severity: enhancement
URL: https://www.mediawiki.org/wiki/Extension:Scribunto/API_specification#ustring_API

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz39646.
MZMcBride created this task.Via LegacyAug 25 2012, 4:36 PM
vvv added a comment.Via ConduitAug 25 2012, 5:56 PM

C code is in SVN, right in luasandbox module. Lua code is in gerrit, but it needs more fixes.

MZMcBride added a comment.Via ConduitAug 26 2012, 2:35 AM

Two points:

(1) http://scribunto.wmflabs.org has some version of a ustring module right now. Not sure how or why, though its function names are painfully abbreviated.

(2) Fran McCrory makes some very interesting points at https://www.mediawiki.org/w/index.php?diff=575869&oldid=575863 about using u'foo' syntax and whether it might make sense to do away with bytestrings altogether.

tstarling added a comment.Via ConduitAug 26 2012, 10:05 AM

The code Victor wrote had a completely different API to the stock Lua string functions, and it wasn't possible to simulate it in pure Lua. So I disabled it before I deployed it. It's better for the functionality to be temporarily missing than to be stuck with a bad interface forever.

MZMcBride added a comment.Via ConduitAug 26 2012, 1:04 PM

(In reply to comment #4)

The code Victor wrote had a completely different API to the stock Lua string
functions, and it wasn't possible to simulate it in pure Lua. So I disabled it
before I deployed it. It's better for the functionality to be temporarily
missing than to be stuck with a bad interface forever.

Thank you for explaining. That's fine and I completely agree. But it would saved me a ton of confusion if this had been made clearer (cf. bug 39655).

Anomie added a comment.Via ConduitFeb 19 2013, 3:40 AM

We have mw.ustring now.

mxn added a subscriber: mxn.Via WebNov 24 2014, 8:56 PM
Ricordisamoa added a subscriber: Ricordisamoa.Via WebJan 1 2015, 11:01 AM

Add Comment