Page MenuHomePhabricator

`mw.ustring.gsub` silently fails (returns `nil`) when matching a large number of characters
Closed, ResolvedPublic

Description

Why would you even be trying to match a large number of characters? mw.text.trim matches the whole string in order to trim the start and end, and thus attempting to use it to trim, say, an entire page of content (in my case, to put a wrapper around it), results in it just disappearing.
I'm not expecting my use-case to be supported (I'll just trim the start and end separately), what I do want however is an error to be returned, instead of silently failing by returning nil.

Tests: (using the pattern that mw.text.trim uses)

InputResult
= #( mw.ustring.gsub( string.rep( 'A', 499997 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )499997
= #( mw.ustring.gsub( string.rep( 'A', 499998 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )0
= #( string.gsub( string.rep( 'A', 14000000 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )14000000
= #( string.gsub( string.rep( 'A', 15000000 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )Lua error: not enough memory

Event Timeline

What seems to be going on here is that pcre is hitting its configured backtrack limit.

Change 279377 had a related patch set uploaded (by Anomie):
Add handling for PCRE errors in ustringGsub

https://gerrit.wikimedia.org/r/279377

Change 279377 merged by jenkins-bot:
Add handling for PCRE errors in ustringGsub

https://gerrit.wikimedia.org/r/279377

Anomie claimed this task.