Page MenuHomePhabricator

Trailing whitespace isn't stripped from a Regex-parsed string taken as a template argument for a Lua module
Closed, InvalidPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  1. Inside a template, invoke a module that takes one argument, which is a string with an absurd amount of leading and trailing whitespace. Use an anonymous parameter; do not define this directly in #invoke.
  2. Surround this template with <includeonly> tags. Inside <noinclude> tags, transclude the template onto its own page and include the absurd string argument.
  3. In a module, capture only the content of said string argument, excluding whitespace, using Regex. I used the capture pattern ^%s*(.+)%s*$ with string.match().
  4. Save both template and module, then refresh the template page to see result.

What happens?
Leading whitespace of a string is stripped completely, but trailing whitespace is left untouched.

If this is my input:
Moo

I get this when capturing with Regex:
Moo

Here is what I attempted inside a module:

local p = {}

function p.main(frame)
    local args = frame:getParent().args
    local finalString = 'START-->'

    for _, line in ipairs(args) do
        finalString = finalString..'|'..line:match('^%s*(.+)%s*$')
    end

    finalString = finalString..'<--END'

    return finalString
end

return p

What should have happened instead?
Leading and trailing whitespace should be removed from the string using Regex with string.match(), similar to how mw.text.trim() works.

This should be what I get from the input above:
Moo

This works in a code editor, or using Lua's built-in environment after it's compiled, but it doesn't work on a wiki for some reason.

Software version (skip for WMF-hosted wikis like Wikipedia)
Software

  • MediaWiki: 1.39.0
  • PHP: 8.0.26 (fpm-fcgi)

Installed extensions

  • Scribunto
  • TemplateStyles: 1.0 (f09fb72)

Installed libraries

  • wikimedia/parsoid: 0.16.0-a21

Other information (browser name/version, screenshots, etc.):
I ran this test on:
Chromium ( Version 110.0.5481.100 (Official Build) (64-bit) )

Event Timeline

If I understand correctly, you are trying to remove leading and trailing whitespaces by using a Lua pattern (not a RegExp, mind you) and the string.match/mw.ustring.match functions:

line = '    foo bar    '
print(line:match('^%s*(.+)%s*$')) -- 'foo bar    '

The + quantifier is greedy; it captures everything it can. Since . matches all characters, including whitespaces, nothing is given back to %s*$. The only non-greedy quantifier in Lua is -, and in this case it works well:

greedy = '^%s*(.+)%s*$'
lazy   = '^%s*(.-)%s*$'

line = '    foo bar    '
print(line:match(lazy)) -- 'foo bar'

all_whitespaces = '         '
print(all_whitespaces:match(greedy)) -- ' '. Oops!
print(all_whitespaces:match(lazy)) -- ''. That's better.

The RegExp equivalent (^\s*(.+)\s*$) works the same (all flavors).

The lazy quantifier did the trick. I searched Google and Stack Overflow trying to figure out what the minus sign - was used for in a capture pattern, and didn't realize it was the alternative to the ? in RegExp. I was aware that the way patterns worked between Lua and RegExp were different, but I didn't know Lua's patterns were not technically RegExp.

Thank you for providing an answer and a good visual demonstration of whitespace trimming using the anchors. Sorry for the false bug report.

Aklapper changed the task status from Resolved to Invalid.Feb 16 2023, 1:29 PM
Aklapper subscribed.