Page MenuHomePhabricator

Implement a {{#trim}}, {{#ltrim}}, {{#rtrim}} in StringFunctions
Open, LowPublic

Description

Sometimes it would be great if there would be a {{#trim}} (core) function, since the unnamed parameters are not trimmed by default.
I'd propose {{#trim}}, {{#ltrim}} and {{#rtrim}} to be included in the core functions.
They should have an optional second parameter, like their PHP equivalents.

The classic hack for trimming is {{#if:trim|{{{1|}}}}} as most performant source code.

The new function shall be tailored for string trimming, with three parameters:

  1. Mandatory non-empty source code, which may yield to empty string, to be trimmed.
  2. Optional leading trim, defaults to trimming, which is left for LTR script but right for RTL. 0 will suppress and keep, anything else would trim.
  3. Optional trailing trim, defaults to trimming, which is right for LTR script but left for RTL. 0 will suppress and keep, anything else would trim.

Suppressing both by {{#trim: {{{1|}}} |0|0}} would add two spaces, but is a null op.

The classic parser function parameter trimming considers three characters:

  • U+09 CHARACTER TABULATION
  • U+0A LINE FEED
  • U+20 SPACE

The new parser function shall strip off many more:

  • U+00A0 NO-BREAK SPACE
  • U+00AD SOFT HYPHEN
  • U+2002 EN SPACE
  • U+2003 EM SPACE
  • U+2004 THREE-PER-EM SPACE
  • U+2005 FOUR-PER-EM SPACE
  • U+2006 SIX-PER-EM SPACE
  • U+2007 FIGURE SPACE
  • U+2008 PUNCTUATION SPACE
  • U+2009 THIN SPACE
  • U+200A HAIR SPACE
  • U+200B ZERO WIDTH SPACE
  • U+200E LEFT-TO-RIGHT MARK
  • U+200F RIGHT-TO-LEFT MARK
  • some more bidi control
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR
  • U+3000 IDEOGRAPHIC SPACE

Some zero width characters are carrying meanings in some languages.

If any character is desirable, those may be encoded as   HTML entity.

The benefit is obvious: easier and more robust template programming and transclusions.

Details

Reference
bz18157

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:30 PM
bzimport added a project: ParserFunctions.
bzimport set Reference to bz18157.
bzimport added a subscriber: Unknown Object (MLST).

It would be nice to add this parser function to the [http://www.mediawiki.org/wiki/Extension:ParserFunctions ParserFunctions-Extension]. I’ll try to do this.

Moving this to Extensions. This won't go in core, but StringFunctions--or ParserFunctions, once it's all finally merged and the former is obsolete--should probably implement this.

Dinoguy1000 renamed this task from Implement a {{#trim}}, {{#ltrim}}, {{#trim}} in StringFunctions to Implement a {{#trim}}, {{#ltrim}}, {{#rtrim}} in StringFunctions.Apr 7 2020, 5:44 AM
Dinoguy1000 updated the task description. (Show Details)

2025 here. 👋 I have just created T394604, and afterwards I noticed this ticket in a workboard.

ltrim and rtrim may be interesting proposals as well; however, these names are centered around left-to-right languages. A more inclusive approach would be to name them trimstart and trimend.

About ltrim and rtrim proposals: parser function branches are automatically trimmed, thus it may be difficult—and inconsistent—to implement these functions, as it would involve introducing some way to not trim spaces. Therefore, I'm not sure this would be a good idea.

On “parser function branches are automatically trimmed”:

  • If no exception can be made on parser function swallowing arguments, the partial versions are pointless.
  • However, {{#trim:{{{1|}}}}} is progress towards {{#if:trim|{{{1|}}}}} hack and will catch various Unicode chars.

On names ltrim and rtrim and LTR/RTL:

  • The current proposal is using one function only, with two booleans; iff 0 they should keep whitespace.
  • The first one is for begin in string char sequence order, the second one is for end in char sequence order.
  • Both are unnamed, therefore their meaning is to be explained in documentation, and could clarify where needed in LTR/RTL context.
  • The proposal is using the words “leading” and “trailing”, detailing the difference to “left” and “right”.
  • Keep whitespace at text begin / keep whitespace at text end.
  • 0 is understood in common RTL scripts, but trim needs to be read and written anyway.
  • I find the syntax with start/end parameters ({{trim:...|0|1}}) to be both awkward and uncommon.
    • Nevertheless, as I mentioned earlier, partial trims cannot be implemented since branches are automatically trimmed.
  • The term "trim" encompasses two distinct use cases:
    • A) Trimming for template coding purposes: Here, we should only remove spaces, newlines, and occasional tab characters.
      • Indeed, I am fairly certain that some values contain leading or trailing non-breaking spaces that we do not want to trim.
      • Additionally, since this function will be used frequently, it must be highly efficient—Unicode handling would undermine that.
    • B) Trimming for editorial purposes (?): In this case, we could consider trimming all possible spaces.
  • To start, I believe we should implement only the first mode (A).
    • If necessary later on, we could introduce a "mode" parameter: {{trim:...|BASIC}} (default) and {{trim:...|EXHAUSTIVE}}.
    • However, I am not convinced that mode B is widely needed. For rare cases, it could potentially be handled in a custom userland Lua module.

I have two goals:

  1. All spooky characters which are invisible to authors shall be removed (at least from begin and end), in all cases. I do not want to bother twice if I am using a trim function; I want to shield all applications from all. Since the invisible things are provided without intention on transclusion by authors, I cannot distinct between templates which expect EXHAUSTIVE and templates which can rely on BASIC. If I compare strings I need to get rid of all invisible stuff, always.
  2. I want to limit the number of functions. Actually, BOTH is the common case and preserving trailing or heading whitespace is a very rare exception relevant for unnamed parameters only. Named parameters are always trimmed ASCII BOTH. Therefore one function {{#trim:}} is sufficient and rare special cases might use rare special flags.