Page MenuHomePhabricator

Introduce {{#trim:}} parser function
Closed, DuplicatePublicFeature

Description

The classic hack for trimming is {{#if:trim|{{{1|}}}}} as most performant source code.

The new function shall be tailored for string trimming, with three parameters:

  1. Mandatory non-empty source code, which may yield to empty string, to be trimmed.
  2. Optional leading trim, defaults to trimming, which is left for LTR script but right for RTL. 0 will suppress and keep, anything else would trim.
  3. Optional trailing trim, defaults to trimming, which is right for LTR script but left for RTL. 0 will suppress and keep, anything else would trim.

Suppressing both by {{#trim: {{{1|}}} |0|0}} would add two spaces but is a null op.

The classic parser function parameter trimming considers three characters:

  • U+09 CHARACTER TABULATION
  • U+0A LINE FEED
  • U+20 SPACE

The new parser function shall strip off many more:

  • U+00A0 NO-BREAK SPACE
  • U+00AD SOFT HYPHEN
  • U+2002 EN SPACE
  • U+2003 EM SPACE
  • U+2004 THREE-PER-EM SPACE
  • U+2005 FOUR-PER-EM SPACE
  • U+2006 SIX-PER-EM SPACE
  • U+2007 FIGURE SPACE
  • U+2008 PUNCTUATION SPACE
  • U+2009 THIN SPACE
  • U+200A HAIR SPACE
  • U+200B ZERO WIDTH SPACE
  • U+200E LEFT-TO-RIGHT MARK
  • U+200F RIGHT-TO-LEFT MARK
  • some more bidi control
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR
  • U+3000 IDEOGRAPHIC SPACE

Some zero width characters are carrying meanings in some languages.

If any character is desirable those may be encoded as   HTML entity.

The benefit is obvious: easier and more robust template programming and transclusions.

Event Timeline

Addendum: Lua mw.text.trim() should be extended to similar syntax.

  • They should share the unique core implementation, ensuring identical behaviour and DRY.