Page MenuHomePhabricator

`mw.ustring.gsub` silently fails (returns `nil`) when matching a large number of characters
Closed, ResolvedPublic

Description

Why would you even be trying to match a large number of characters? mw.text.trim matches the whole string in order to trim the start and end, and thus attempting to use it to trim, say, an entire page of content (in my case, to put a wrapper around it), results in it just disappearing.
I'm not expecting my use-case to be supported (I'll just trim the start and end separately), what I do want however is an error to be returned, instead of silently failing by returning nil.

Tests: (using the pattern that mw.text.trim uses)

InputResult
= #( mw.ustring.gsub( string.rep( 'A', 499997 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )499997
= #( mw.ustring.gsub( string.rep( 'A', 499998 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )0
= #( string.gsub( string.rep( 'A', 14000000 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )14000000
= #( string.gsub( string.rep( 'A', 15000000 ), '^[\t\r\n\f ]*(.-)[\t\r\n\f ]*$', '%1' ) or '' )Lua error: not enough memory

Details

Related Gerrit Patches:
mediawiki/extensions/Scribunto : masterAdd handling for PCRE errors in ustringGsub

Event Timeline

Majr created this task.Mar 24 2016, 11:27 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2016, 11:27 AM
Anomie added a subscriber: Anomie.Mar 24 2016, 2:37 PM

What seems to be going on here is that pcre is hitting its configured backtrack limit.

Change 279377 had a related patch set uploaded (by Anomie):
Add handling for PCRE errors in ustringGsub

https://gerrit.wikimedia.org/r/279377

Change 279377 merged by jenkins-bot:
Add handling for PCRE errors in ustringGsub

https://gerrit.wikimedia.org/r/279377

Anomie closed this task as Resolved.Oct 5 2016, 6:52 PM
Anomie claimed this task.