Set $wgPFEnableStringFunctions = true on WMF wikis
Closed, DeclinedPublic

Description

Author: ayg

Description:
Uses of string-related functions would doubtless be myriad. One in particular
that would be handy, from my point of view, is checking whether input to a
hatnote contains "[[" and "]]": if it does, then the wrapping "[[" and "]]"
could be dropped. This is useful because not requiring [[]] in the parameter
input has become the norm on enwiki, and this makes it impossible to replace an
intended single link with "[[Link1]] or [[Link2]]" due to parsing oddness. If
StringFunctions were installed, "[[Link1]] or [[Link2]]" could be used as the
parameter, and the template would drop the default brackets.

Okay, so perhaps that's a slightly pathetic reason, but there's no request open
for this yet, so let others add their own reasons.


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/Extension:StringFunctions

bzimport added a comment.Via ConduitAug 18 2008, 10:21 PM

fran wrote:

Merged the functionality of StringFunctions into ParserFunctions in r39618. :)

brion added a comment.Via ConduitAug 19 2008, 6:54 PM

Reverted in r39653. These functions look *extremely* inefficient, for instance reimplementing mb_strlen() by apparently splitting the entire input string into an array of individual characters and counting up the elements.

bzimport added a comment.Via ConduitAug 26 2008, 11:29 PM

jsimlo wrote:

I finally got some free time to look into it and rewrite the functions into something more efficient. Is anyone working on it right now? Anyway, any comments are welcome, sooner rather than later. I will place my rewrite at http://www.mediawiki.org/wiki/Extension:StringFunctions/Code, since I do not have svn ci access at wikimedia.

DanielFriesen added a comment.Via ConduitAug 27 2008, 6:20 AM

Updates have been committed as of r40068.

However, I'm not convinced it's good enough to put into ParserFunctions. It duplicates parser code in ugly ways, and uses preg_replace_all (Tim pokes me saying the _all is evil...)

I've actually been working on string handling inside of WikiCode. I to found the StringFunctions code extremely ugly. I actually implemented my own functions. They still need some tweaks, however I believe the code is far better than that inside of StringFunctions.

bzimport added a comment.Via ConduitAug 29 2008, 5:26 PM

jsimlo wrote:

Thank you for you input. Actually, you got me hooked, so I put together a simple benchmarking script of a few different implementations of the strlen: http://www.mediawiki.org/wiki/Extension:StringFunctions/Bench

Turns out that Brion was more than right about extreme inefficiency. The preg_match_all version, which splits the input into an array of individual chars can be more than hundred times slower than other implementations and can be more than thousands times slower than the native mb_strlen().

Now, rising the length of benching data put other implementations aside as well. So, here are the results for the best two only (plus simple mb_strlen() version for comparison):

benching 256 loops of length: 1120kB
  runLen0: 2.3116    ..using mb_strlen() only
  runLen2: 18.6722   ..using preg_match_all() through markers
  runLen4: 10.3728   ..using strpos()

So, as it turns out, pregs, even when they are not abused, take some time to process the input. On the other side, using strpos() for counting markers seems to be quite efficient, considering the native mb_strlen() is only 5 times faster for the above bench case. I did more benches and found out that when the length of data is 112kB only, mb_strlen() is upto 10 times faster; when the length of data is 28kB, mb_strlen() is upto 12 times faster.

Any other ideas of how to implement such a simple thing as strlen()? ;))

bzimport added a comment.Via ConduitSep 19 2008, 8:57 PM

Soroush83 wrote:

Somebody wanted me to make a template in fa.wiktionary and I got informed that this Extension is required for there. I read its potential usage for en.wiki and I agree to its installation on en.wiki as well.

Raymond added a comment.Via ConduitSep 20 2008, 2:01 PM
  • Bug 15658 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitMar 19 2009, 3:53 PM

mike.lifeguard+bugs wrote:

The extension isn't ready, so I've removed the shell keyword.

Dragons_flight added a comment.Via ConduitMar 30 2009, 11:07 AM

rohde wrote:

New version of string functions

The attached file is complete rewrite of the StringFunctions extension.

It implements the parser functions:

#len - string length
#pos - finding substring position
#rpos - reverse oriented #pos
#sub - fetch a substring specified by start and length
#replace - substring replacement
#explode - partition string by a delimiter and find a specific piece

The other functions, which are mostly already in the core, have been dropped.

In addition, I implemented it so that the unique markers generated by <nowiki>, <gallery>, <math>, etc. are universally stripped (this is a partial change in behavior from prior versions). So the behavior will be more uniform and predictable than prior versions and there is no risk of partial or unexpected markers bleeding through.

Where possible PHP's built-in multi-byte string functions are used provide fast results. If the mb_ functions are unavailable, their behavior is simulated in regex in order to provide a graceful (if slower) failure mode.

A global variable is used to define a hard limit for the size of a string to operate on. I've set this 1000 characters for now, but I haven't experimented too much to decide what is reasonable or whether different limits should be enforced for different functions. #replace is armored against replacements that would generate strings longer than this limit.

I believe that this version of StringFunctions (or something close to it) should be suitable for implementation on WMF sites.

Attached: StringFunctions_new.patch

bzimport added a comment.Via ConduitMar 30 2009, 1:22 PM

ayg wrote:

(In reply to comment #65)

The attached file is complete rewrite of the StringFunctions extension.

Shouldn't they just be added to ParserFunctions?

In addition, I implemented it so that the unique markers generated by <nowiki>,
<gallery>, <math>, etc. are universally stripped (this is a partial change in
behavior from prior versions). So the behavior will be more uniform and
predictable than prior versions and there is no risk of partial or unexpected
markers bleeding through.

I'm not sure if stripping them outright is the best solution, but I can't think of a better one.

Where possible PHP's built-in multi-byte string functions are used provide fast
results. If the mb_ functions are unavailable, their behavior is simulated in
regex in order to provide a graceful (if slower) failure mode.

We already have some of these compatibility functions in GlobalFunctions.php (mb_strlen and mb_substr). You should use those, and add any additional ones there.

bzimport added a comment.Via ConduitMar 30 2009, 4:34 PM

ted_kandell wrote:

A good use of the String functions would be to parse Newick tree format (Newick notation) files which is the standard way of minimally representing phylogenetic trees. Trees are now an important data structure in Wikipedia, and it's very difficult to edit these by hand and to get them to align and display properly. A simple {{newick}} template could then convert a Newick string into a properly displayed tree.

This may seem trivial compared to the other reasons, but just check the myriad ways that trees are now represented in MediaWiki. Having this template would allow trees to be created and edited in external tools and just dropped in.

I don't see any other way of parsing such a format without the String Functions.

Dragons_flight added a comment.Via ConduitMar 30 2009, 5:05 PM

rohde wrote:

(In reply to comment #66)

(In reply to comment #65)
> The attached file is complete rewrite of the StringFunctions extension.

Shouldn't they just be added to ParserFunctions?

I'm happy to write it up that way instead, though I don't know which is preferred. Given how long we and other sites have gone without working StringFunctions, it almost feels more natural to segregate them so that site operators have a choice.

My main interest though is getting an implementation somewhere that is sufficiently reasonable that it can be used on the WMF sites.

> Where possible PHP's built-in multi-byte string functions are used provide fast
> results. If the mb_ functions are unavailable, their behavior is simulated in
> regex in order to provide a graceful (if slower) failure mode.

We already have some of these compatibility functions in GlobalFunctions.php
(mb_strlen and mb_substr). You should use those, and add any additional ones
there.

Okay. The one caveat is that my functions more or less assume they are being passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is not implemented. It appears that mb_strlen in GlobalFunctions is making the same assumption, so I'll assume that is okay for Mediawiki's purposes.

Dragons_flight added a comment.Via ConduitMar 30 2009, 7:20 PM

rohde wrote:

(In reply to comment #68)

(In reply to comment #66)
> We already have some of these compatibility functions in GlobalFunctions.php
> (mb_strlen and mb_substr). You should use those, and add any additional ones
> there.

Okay. The one caveat is that my functions more or less assume they are being
passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is
not implemented. It appears that mb_strlen in GlobalFunctions is making the
same assumption, so I'll assume that is okay for Mediawiki's purposes.

Added the necessary mb_ fallbacks to GlobalFunctions in r49043.

Figuring out the merge with ParserFunctions will take more time.

I'll probably post that as an alternative patch here and let someone with more familiarity decide whether it is better to build StringFunctions as a separate stand-alone or to merge it into the ParserFunctions.

Dragons_flight added a comment.Via ConduitApr 6 2009, 7:44 AM

rohde wrote:

Merge string functionality into ParserFunctions

Comments suggested it may be preferably to merge string functionality into ParserFunctions. The attached patch would accomplish that. The logic should be the same as the other StringFunctions patch, so one should choose one patch or the other depending on whether it is preferred for StringFunctions to operates as a separate stand-alone extension or as a component of ParserFunctions. I'm not sure which approach is preferable. Minor tweaks were made to more or less follow the existing layout conventions in ParserFunctions.

Also note that StringFunctions and ParserFunctions were originally written under different copyleft schemes. I asked for and received permission from the referenced authors to GPL the StringFunction code in order to facilitate the merge.

Attached: Add_strings_to_ParserFunctions.patch

bzimport added a comment.Via ConduitMay 14 2009, 4:25 PM

catlow wrote:

Sorry, I'm bemused. Every programming language I've met (admittedly that's not very many) has these string functions as absolute basic standard. How does it take three years to find a way to expose them through MW?

mxn added a comment.Via ConduitMay 14 2009, 7:57 PM

The wiki syntax (especially the subset used on Wikimedia sites) isn't quite intended as a full-fledged programming language, though it's getting to be one. Think of it more as a language for macros. Notice that there's no built-in support for iteration, either, and that's an absolute basic standard for programming languages too.

bzimport added a comment.Via ConduitMay 14 2009, 9:00 PM

catlow wrote:

I think you missed my point - I don't mean MW has to have something because programming languages have it, I mean if programming languages have it as standard, AND we want to have it (as we clearly do in this case), then it surely must be a pretty trivial matter to code. Surely there are standard php libraries which have all these functions?

bzimport added a comment.Via ConduitMay 14 2009, 10:03 PM

ayg wrote:

It's already implemented. Robert has a patch, which he can commit if he likes. He hasn't so far.

Catrope added a comment.Via ConduitMay 15 2009, 10:59 AM

(In reply to comment #73)

I think you missed my point - I don't mean MW has to have something because
programming languages have it, I mean if programming languages have it as
standard, AND we want to have it (as we clearly do in this case), then it
surely must be a pretty trivial matter to code. Surely there are standard php
libraries which have all these functions?

If you read the comments (granted, 74 is a lot), you'll see that there were issues with previous implementations, such as the need to use Unicode-aware string functions, the need to fall back to alternative implementations if those functions aren't available (they're a PHP extension) and the need to do all this efficiently.

Robert has attached a patch, which he could (and probably should, or maybe already has?) committed to StringFunctions in SVN; Tim or Brion can then review that and, if it passes, enable it on Wikipedia.

bzimport added a comment.Via ConduitMay 15 2009, 1:45 PM

ayg wrote:

The patch is to ParserFunctions, so it wouldn't need review beyond the normal process.

Dragons_flight added a comment.Via ConduitMay 26 2009, 12:49 AM

rohde wrote:

I made some additional tweaks to the second patch and committed it as r50997.

bzimport added a comment.Via ConduitMay 26 2009, 12:52 AM

ayg wrote:

Marking FIXED, then. Close enough to the original request.

bzimport added a comment.Via ConduitJun 19 2009, 11:51 AM

happy.melon.wiki wrote:

The spirit of this bug is clearly "enable StringFunctions on WMF wikis". So now we need $wgEnableStringFunctions = true; to be set on WMF wikis. But the substance of this bug is not resolved. Reopening.

bzimport added a comment.Via ConduitJun 19 2009, 4:54 PM

ayg wrote:

Tim has stated pretty clearly that string functions will not be enabled on Wikimedia wikis, so I'll mark this WONTFIX.

bzimport added a comment.Via ConduitJun 19 2009, 5:36 PM

ayg wrote:

(In reply to comment #80)

Tim has stated pretty clearly that string functions will not be enabled on
Wikimedia wikis, so I'll mark this WONTFIX.

. . . that's in r51497. Quote from diff:

+/**
+ * Enable string functions.
+ *
+ * Set this to true if you want your users to be able to implement their own
+ * parsers in the ugliest, most inefficient programming language known to man:
+ * MediaWiki wikitext with ParserFunctions.
+ *
+ * WARNING: enabling this may have an adverse impact on the sanity of your users.
+ * An alternative, saner solution for embedding complex text processing in
+ * MediaWiki templates can be found at: http://www.mediawiki.org/wiki/Extension:Lua
+ */

It's pretty clear this isn't going to be enabled on Wikimedia.

Tgr added a comment.Via ConduitJun 19 2009, 6:36 PM

Opened bug 19298 for enabling Lua as per Tim's suggestion.

bzimport added a comment.Via ConduitJun 24 2009, 4:21 PM

nykevin.norris wrote:

Can we mark this bug as LATER instead of WONTFIX given the disagreement with Tim's decision expressed in the comments for the Lua bug?

bzimport added a comment.Via ConduitJun 24 2009, 9:42 PM

ayg wrote:

The people disagreeing with Tim don't get to make decisions like this, Tim does. So not much point. Any WONTFIX could be revisited later, of course.

Dragons_flight added a comment.Via ConduitJun 24 2009, 10:27 PM

rohde wrote:

(In reply to comment #84)

The people disagreeing with Tim don't get to make decisions like this, Tim
does. So not much point. Any WONTFIX could be revisited later, of course.

Well, in point of fact, they are sort of disagreeing with you Aryeh, since you are the one who tagged it WONTFIX.

Tim's comments are discouraging, but it isn't clear to me that they represent a final conclusion on the subject. That's doubly true since Brion has already said Lua won't be installed in the near-term (if ever), so Tim's preferred solution is pretty much no solution at all.

While Tim's concern for the sanity of wikicode is well-intentioned, I've yet to see any template coder (i.e. the people who would really be working with this) come forward to say that the incremental burden of enabling this would be terrible. Given the evident desire of the community, and the fact that Tim's alternative isn't really available, I am wondering if this should be reopened and given more developer discussion.

bzimport added a comment.Via ConduitJun 24 2009, 11:07 PM

ayg wrote:

I was taking Tim's statement as a fait accompli. If you want to reopen this and maybe start a wikitech thread, go ahead, I don't agree with him.

Rich_Farmbrough added a comment.Via ConduitSep 4 2009, 5:50 PM

Hm there seem to be facilities for manipulating stings enabled now. So this is either fixed or it is being done by an almighty kludge and probably far less efficiently than "fixing" this. See my comment to 19298.

bzimport added a comment.Via ConduitSep 4 2009, 9:17 PM

happy.melon.wiki wrote:

(In reply to comment #87)

either fixed or it is being done by an almighty kludge

Oh yes. Those templates put all other hacks to shame. But they work, and they're now very widely used. Which demonstrates the need for this functionality to be supported *somehow*.

bzimport added a comment.Via ConduitSep 8 2009, 10:44 PM

nykevin.norris wrote:

(In reply to comment #88)

But they work

Really? IIRC we don't have substrings (yet)...

Rich_Farmbrough added a comment.Via ConduitSep 12 2009, 9:44 AM

I wrote {{Sub right}} and another one recently - trying to get a title case template to work. See Category:String manipulation templates, most have been around for a while.

Rich_Farmbrough added a comment.Via ConduitNov 17 2010, 5:41 PM

This really needs some attention. We have perfectly good templates for doing minor stuff that work, provided there are less than "X" of them on a page, where "X" is a small number. Wontfix is not a good status for this bug. Reopening.

bzimport added a comment.Via ConduitNov 17 2010, 5:58 PM

matthew.britton wrote:

(In reply to comment #91)

This really needs some attention. We have perfectly good templates for doing
minor stuff that work, provided there are less than "X" of them on a page,
where "X" is a small number. Wontfix is not a good status for this bug.
Reopening.

I take issue at the description "perfectly good".

What happened was, a while back well-meaning people asked for "padleft" and "padright" string functions. The devs decided to add support for these specific functions, assuming -- foolishly -- that they wouldn't be abused within an inch of their life.

Since then, various string functions (length, string search functions, sub-string based function) have been implemented using unmaintainable, indecipherable nested MediaWiki templates IN TERMS OF PADLEFT AND PADRIGHT. This is something I didn't even know was possible and probably constitutes in interesting academic exercise... oh, except this is in production use on one of the world's busiest websites.

The algorithms involves are so hideously inefficient that given the huge overhead incurred by having to parse wikitext every step of the way.

Have a look at how "str len" is implemented. This:

http://en.wikipedia.org/w/index.php?title=Template:Str_len/core&action=edit

is just part of it.

When you've finished washing your eyes out with bleach, look at the "str find" template. Note its reliance on the aformentioned "str len", as well as "str left" and various other horrendous string functions. Note that at the bottom of this hierarchy of {{{{}{}{}{}{}{}{}{{}}}} lies #padleft, #titleparts and various other functions that you wouldn't normally expect to be roped into string searching, unless you were in a batshit insane environment where they were the only primitive functions available... oh wait.

It probably takes as long to evaluate one of these string functions on a modern, top-of-the-range multicore server machine as it would to evaluate a sane implementation on a 1980s home computer. The algorithm for "str find" wouldn't even be too bad if it was implemented directly in C or something, but don't pretend that MediaWiki template syntax isn't the least efficient programming language ever created. Including several joke ones.

Come to think of it, yes, there is a really fucking good argument for enabling StringFunctions on Wikimedia wikis. And also for tracking down the people who implemented templates like [[Template:Str find]] and murdering them for crimes against programming.

MaxSem added a comment.Via ConduitNov 17 2010, 6:06 PM

(In reply to comment #92)

Come to think of it, yes, there is a really fucking good argument for enabling
StringFunctions on Wikimedia wikis. And also for tracking down the people who
implemented templates like [[Template:Str find]] and murdering them for crimes
against programming.

No, it's a great reason to disable #padleft and friends instead. Things ParserFunctions are (ab)used for are insane, and the more of them are there, the more insane things they allow. This spiral dive has to stop somewhere.

Dinoguy1000 added a comment.Via ConduitNov 17 2010, 6:33 PM

There will be a lot of angry people (and broken functionality, with no obvious way to fix, replace, or remove it) if the only currently enabled way to implement string parsing in wikicode on WMF wikis is simply disabled or removed. It is not the template coders' fault for abusing the hell out of padleft and padright, they are simply making do with the only tool they can use, and would certainly use something else if it were available (and I do mean they'd use pretty much *anything* else, as just about anything would be an improvement over the current situation). It's not like padleft and company are being blindly used either, these templates are massively optimized and however bad it is, it could be far, far worse.

bzimport added a comment.Via ConduitNov 17 2010, 8:23 PM

catlow wrote:

Of course the use of padleft and so on shouldn't be happening, but it's not the fault of the people who worked out those hacks. This really is a no-brainer - PLEASE *enable the efficient string functions*, and we won't be using the mind-blowingly inefficient ones any more. (Notice that the servers are still up and running in spite of the use of the inefficient hacks, so replacing them with more efficient functions will certainly not be any kind of performance hit.)

MZMcBride added a comment.Via ConduitNov 18 2010, 12:00 AM

(In reply to comment #91)

This really needs some attention. We have perfectly good templates for doing
minor stuff that work, provided there are less than "X" of them on a page,
where "X" is a small number. Wontfix is not a good status for this bug.
Reopening.

I don't have any problem with users overturning a WONTFIX with a valid reason. I've certainly done so a number of times. However, this bug as currently summarized reads "Enable StringFunctions on WMF wikis" and the most senior active sysadmin and developer has (essentially) said this is never going to happen. Re-reading comment 0 (way the hell up there), this bug was not originally about a specific extension, just about the functionality.

Either this bug should be re-closed as WONTFIX or the bug summary should be genericized. The current match-up is disingenuous and misleading.

Dinoguy1000 added a comment.Via ConduitNov 18 2010, 1:57 AM

Looking at this bug's history, the very first entry is Rob Church changing the summary from "Install StringFunctions" to "Install the StringFunctions extension". Unless there's missing history here (which is doubtful, since the change was made less than a half-hour after the bug report was filed), this bug was indeed originally about a specific extension, and a careful reading of comment 1 and comment 3 support this.

bzimport added a comment.Via ConduitNov 18 2010, 1:56 PM

nykevin.norris wrote:

MZ, are you seriously suggesting that the developers will completely re-implement an extension, when the concerns about the original are *not* implementation-specific? I seriously doubt that.

MZMcBride added a comment.Via ConduitNov 18 2010, 8:16 PM

(In reply to comment #98)

MZ, are you seriously suggesting that the developers will completely
re-implement an extension, when the concerns about the original are *not*
implementation-specific? I seriously doubt that.

I'm suggesting that the sysadmins in charge of running Wikimedia wikis have said rather unequivocally that this extension is not going to be installed. The StringFunctions extension is a means to an end. There are plenty of other ways to implement string manipulation. For years, there has been discussion of implementing a proper programming language into MediaWiki. The current preferred favorite is not Lua, but JavaScript, actually.

I don't believe that there is any legitimate objection to letting users manipulate strings. However, there are legitimate objections to enabling this extension on Wikimedia wikis. This bug is about enabling a specific extension on Wikimedia wikis. Unless there is some reason to believe this is ever going to happen, this bug should be re-closed as WONTFIX. A subsequent, generic bug should be filed about the ability to manipulate strings on Wikimedia wikis (though there's little hope of that bug being resolved anytime soon). Keeping this bug unresolved in the REOPENED state does not change the reality of the situation. It just misleads people into believing that this is still up for debate.

Tgr added a comment.Via ConduitNov 18 2010, 8:57 PM

(In reply to comment #99)

I don't believe that there is any legitimate objection to letting users
manipulate strings. However, there are legitimate objections to enabling this
extension on Wikimedia wikis.

Actually, what are those? Tim's oft-cited comment stated that StringFunctions should be deprecated in favor of Lua, but since then it was decided that Lua is an even worse option. As for a hypothetical server-side Javascript-based string manipulation extension, it has most of the drawbacks of Lua (denial-of-service vulnerability, incompatibility of Wikipedia with a MediaWiki at an average web host), with the added bonus that Lua at least exists and does not need to be implemented from scratch.

More importantly, what are the disadvantages of StringFunctions compared to the current situation? #padleft-based string manipulation is slower, less reliable, harder to understand and maintain, and more limited in its abilities. It used to be said that SF should not be enabled because then a lot of pages will depend on it, and it will be difficult to switch to a superior solution when one is found, but we already crossed that river a long time ago.

vvv added a comment.Via ConduitNov 18 2010, 9:29 PM

(In reply to comment #100)

Actually, what are those? Tim's oft-cited comment stated that StringFunctions
should be deprecated in favor of Lua

Tim's comment was that we are not going to expand the parser functions in any way and all our further development should be concentrated at the development of sensible scripting engine instead of turning parser functions into programming language. As far as I am aware he did not change his mind about that so this bug is closed and should not be reopened unless the policy mentioned above is changed as a result of discussion among the developers (you may initiate it).

Krinkle added a comment.Via ConduitNov 18 2010, 9:35 PM

A request once was made to implement padleft to pad left. Then more advanced functions were wanted and existing functions (ab)used to achieve it.
I can imagine developers not wanting to natively those now wanted advanced functions as it will likely lead to history repeating itself, namely some other advanced thing wanted being implemented with these etc etc.

There are way too many scripts and templates that should be and can be written as an Extension instead.

So how about opening bugs for the actual functionality wikipedians want instead of requesting functions to achieve them in templates ? The same was done with Babel, instead of creating lots and lots of templates and decentralized stuff all over the place it was written into a native Extension and everybody's happy.

I realise this is not a solution for everything though )

Krinkle added a comment.Via ConduitNov 18 2010, 9:36 PM

mid-air collision mistake. Self-reverting status change

Slomox added a comment.Via ConduitNov 18 2010, 9:52 PM

(In reply to comment #102)

So how about opening bugs for the actual functionality wikipedians want instead
of requesting functions to achieve them in templates ? The same was done with
Babel, instead of creating lots and lots of templates and decentralized stuff
all over the place it was written into a native Extension and everybody's
happy.

??? Have I missed some developments? The extension was created, then not reviewed by the developers and everybody is still unhappy with the old system.

And I'm suspecting the exact same thing will happen with StringFunctions...

bzimport added a comment.Via ConduitNov 18 2010, 10:08 PM

ayg wrote:

Note that string function support was added to ParserFunctions proper in r50997, and disabled by default by Tim in r51497 -- a separate extension is no longer needed. I don't know if anything has happened since June 2009 to cause him to reconsider his opinion. I personally have thought for a long time that enabling string functions is the lesser evil here, given the givens, but it's not my call.

bzimport added a comment.Via ConduitNov 19 2010, 2:04 AM

nykevin.norris wrote:

Could Tim Sterling please indicate whether his veto on this bug is still outstanding?

If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the following proposal:

"There is community consensus to enable StringFunctions; if the developers do not enable it themselves, the community hereby requests that the WMF instruct the developers to do so."

I really hate to go over your heads on this one, but it appears to be necessary. As Aryeh said, SF is clearly "the lesser evil" and it's patently ridiculous that the nth most popular site in the world (n=whatever our current Alexa rank is) is using [[Category:String manipulation templates]] instead of native php, especially when the functionality to do so is available and tested.

bzimport added a comment.Via ConduitNov 19 2010, 7:21 AM

msh210+wmfbugzilla wrote:

(In reply to Kevin Norris's comment #106)

If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the
following proposal:

"There is community consensus to enable StringFunctions; if the developers do
not enable it themselves, the community hereby requests that the WMF instruct
the developers to do so."

If you do bring up such a proposal on-wiki, please link to it in a comment on this bug so that people on other wikis know. Thanks.

Rich_Farmbrough added a comment.Via ConduitNov 19 2010, 2:49 PM

I'm pretty sure it would garner extensive support.

#Pro: less server load
#Pro: less page breakage
#Pro: easier template programming
#Pro: faster page load/render times
#Pro: less obscure limits on lengths
#Pro: less templates which work fine in test but are useless on an actual page

#Con: If we implement a scripting language we may need to migrate some stuff - which we would anyway.

Things have moved on in four years, but we are still struggling with ancient functionality. Wikia has more powerful facilities than the WMF projects.

I'm disappointed someone re-closed the bug, it was not re-opened lightly.

Anyone in doubt as to the importance of this bug is invited to look at VP(T) where I believe almost half the threads are related to it.

bzimport added a comment.Via ConduitNov 19 2010, 3:49 PM

happy.melon.wiki wrote:

(In reply to comment #108)

I'm pretty sure it would garner extensive support.
...
Anyone in doubt as to the importance of this bug is invited to look at VP(T)
where I believe almost half the threads are related to it.

You[1] are making a mistake in assuming that if the enwiki community supports a technical change then, ipso facto, that change should be implemented, irrespective of any 'big picture' considerations. You're[1] not in Kansas any more; the consensus of the enwiki community is not sovereign here.

I'm disappointed someone re-closed the bug, it was not re-opened lightly.

It was re-opened mistakenly under a [[WP:BRD]] principle which just doesn't apply here. It is perfectly acceptable to comment, where appropriate, on closed bugs; the status applies only to the bug title, not to the discussion underneath. Tim has said that the status of the request "Set $wgPFEnableStringFunctions=true on WMF wikis" is WONTFIX; that conclusion stands until something (maybe discussion under the closed bug, maybe something else) convinces *him* or *another sysadmin of equal standing* to reconsider it. Someone else changing the status does not somehow reshape the world to make it so.

[1] I'm speaking generally, not to anyone specifically.

bzimport added a comment.Via ConduitNov 19 2010, 4:33 PM

ayg wrote:

(In reply to comment #108)

#Pro: less server load
#Pro: faster page load/render times

Experience has shown that people will just write pages that use up whatever the resource limits are. They'll use the functions to write still more complicated templates, which currently they can't write because of preinclusion size limits. It's not at all obvious it will make anything faster, it will just allow more complexity for the same length limit.

In support of this, observe that ParserFunctions was only introduced to provide a sane replacement for [[Template:Qif]], much as this bug requests that StringFunctions be enabled to replace [[Template:Str len]] and friends. The explosion of template complexity after ParserFunctions were turned on would have been impossible (given performance limits) with template hacks. It's a certainty that that will happen again if we enable StringFunctions, with template editing becoming even more arcane.

Maybe we should enable the string functions, but reduce preinclusion length limit, or impose other limits on template complexity.

#Pro: less page breakage

How so?

#Pro: easier template programming

Not if things get even more complicated to compensate, which they will.

#Pro: less obscure limits on lengths

The limits on length will be the same, it's just people will write even more complicated templates to use up the length limits.

#Con: Templates like {{str len}} will no longer count as much against the length limit, so the effective limit will be higher and people will be able to make even more complicated and unmaintainable wikitext pages for things that should have been written in a real language to start with.

I agree that enabling string functions is the lesser evil, but it's still evil. People shouldn't have been writing programs in wikitext to begin with, they should use proper scripts of some type -- extensions or bots or such. Personally I'd also be okay with restricting or disabling any functions that people are abusing to emulate string functions, like padright/left, but that would be much more disruptive, and people will always find ways to abuse innocent functionality. So unless someone is willing to implement a systematic solution like a Lua extension, we may as well resign ourselves to making template programming less painful.

bzimport added a comment.Via ConduitNov 19 2010, 5:21 PM

catlow wrote:

...abuse...

It's not abuse (which would be putting good tools to bad use), this is putting bad tools to good use.

...proper scripts...extensions...

Yes, this seems to be the vicious circle we're in... someone *has* written an extension, but what good did it do him - we're now deprived of the use of the extension, just in case someone "abuses" it by making better use of it than was anticipated. It would obviously be much much better to have non-trivial logic compiled into the software than to do it via templates, but what choice are we given?

Slomox added a comment.Via ConduitNov 19 2010, 7:35 PM

As far as I can see all of the StringFunctions are already present in template-implemented versions now. Just in an inefficient way. So any "abuse" (quotation marks because of Le Chat's good remark) would be possible already now.

Does anybody have any ideas in which direction possible "abuse" could go? I cannot think of any new class of functionality that would become possible if we allowed StringFunctions. The template-based string functions too were not enabled by ParserFunctions alone. Template-based string functions would be impossible without "padleft:" and "padright:". These two are string functions. It's clear that when you provide a single string function and simple logic, that other string functions can be emulated. That door was left open and people walked through. But if StringFunctions do not open new doors nothing bad can happen. I don't see open doors in them. If you do see them, please report.

I guess we can safely assume that when you provide functionality people will _always_ test the limits of the functionality. It doesn't matter how few or how amazingly much functionality you provide. They will test it limits. It's almost a law of nature. That's normal and we will never have success with "We provide this functionality but please don't fully utilize it".

We have to put limitations on functionality because we need to limit the computation cost and rendering time. If we replace the template-based string functions with extension-based StringFunctions we will reduce computation cost and rendering time. That's a good thing. If you want to secure that this gain will not be consumed by increased use of the functions then set limitations on how many instances of the functions can be called on a single page.

By the way, I'm sure there are wikis with activated StringFunctions. Are there any reports that these wikis had problems with it? If there are any open doors in them, I'm sure somebody must have discovered them already!?

Dinoguy1000 added a comment.Via ConduitNov 19 2010, 8:01 PM

(In reply to comment #112)

By the way, I'm sure there are wikis with activated StringFunctions. Are there
any reports that these wikis had problems with it? If there are any open doors
in them, I'm sure somebody must have discovered them already!?

Wikia - *all* of Wikia - has had StringFunctions enabled for years. I've never heard about any problems they've had as a result of this.

Tgr added a comment.Via ConduitNov 20 2010, 11:35 AM

(In reply to comment #110)

Experience has shown that people will just write pages that use up whatever the
resource limits are. They'll use the functions to write still more complicated
templates, which currently they can't write because of preinclusion size
limits. It's not at all obvious it will make anything faster, it will just
allow more complexity for the same length limit.
[...]
Maybe we should enable the string functions, but reduce preinclusion length
limit, or impose other limits on template complexity.

You make it sound as if complexity would be a bad thing in itself. That is not so - complex tasks require complex solutions, most of the time. MediaWiki itself has become much more complex along the years, the editing workflow became more complex, Vector was a huge jump in the complexity of the editing GUI, and so on. Everyone accepts these as necessary, so why not the same for complex templates? Seems like a bit of NIH syndrome to me (or more precisely, Not Invented By Us, because it *is* invented here, just not by the developers).

I sense a good amount of developer hubris in the debates about templates - "you should leave this stuff to us, we could do it better". Sure you could - but you could do much less of it. By the same account, we should leave writing encyclopedia articles to professionals, because they are much better at it (except that Nupedia had some 100 articles after three years). This line of thinking is completely contrary to Wikipedia philosophy. Wikipedia is about generativity, community empowerment and ultra-low barriers to entry - you can't seriously suggest that making a feature request and waiting for some developer to pick it up every time someone needs a new template would be a scalable approach.

People shouldn't have been writing programs in wikitext to begin with, they
should use proper scripts of some type -- extensions or bots or such.

This gets thrown around a lot, but how those proper scripts could replace the current template system is never demonstrated. Bots are not much help with dynamic text (and do have problems of their own, like littering page histories). Extensions, as I tried to point above, are not scalable (whatever you might think of the template syntax, it is a lot easier to learn than writing secure and scalable MediaWiki extensions, and we didn't even consider yet the epic fail of code review). The conclusion of the bug about Lua was that templates using scripts interpreted by some external tool are out of the question - they have security issues, and they would break compatibility of Wikipedia with pretty much all other MediaWiki installations. What is left then? Inventing another template language and writing another parser in PHP? IIRC Werdna actually offered to do that and was turned down, because that is still not a "proper" solution. The proper solution, apparently, is to deny the Wikimedia community of a useful tool, out of purely aesthetic reasons.

vvv added a comment.Via ConduitNov 20 2010, 12:20 PM

(In reply to comment #114)

What is left then? Inventing another template language and writing another
parser in PHP? IIRC Werdna actually offered to do that and was turned down,
because that is still not a "proper" solution.

I was working on a template scripting extension called InlineScripts. It is in Subversion and it was working last time I checked (it's most severe problem was the documentation, or, to be more specific, the absence of it). It was discussed on the developers' conference in April and the only reason I stopped working on it was the lack of time.

bzimport added a comment.Via ConduitNov 21 2010, 2:18 AM

ted_kandell wrote:

I would like to add a concrete example to this debate, an actual use case.

Many entries in Wikipedia describe some sort of phylogenetic data, from genealogies, to the phylogenies of language families, to Y and mitochondrial DNA haplogroups.

A standard way of representing such trees is through the Newick format:
http://en.wikipedia.org/wiki/Newick_format

There are all sorts of template hacks in Wikipeida to represent family trees, genetic trees, language families, and all sorts of other related information.

Wouldn't it be better to just add the standard Newick format tree representation to articles, and then use templates to display the data in various sorts of ways? The fundamental information would then be preserved in a standardized display-independent format. Also there are a large number of tools out there that can generate graphic images based on the Newick format.

It isn't very difficult to parse a Newick format string and create a basic tree display template from it. However, all this really would need is a full set of string functions. It's true that PHP and MediaWiki wasn't designed to be a kind of parser or compiler, but what sort of alternative can anyone think of?
Should we put in a request for MediaWiki developers to support the Newick format, and any number of other important display-independent representations of data widely used in Wikipedia? Who decides, and who does the work?

There is a good argument to "doing it right" and implementing a full scripting language (aside from Javascript?) but in the meantime, all sorts of important data that can't quite be represented as text is being added to Wikipedia in the form of templates. How can all the various sorts of tree data now in Wikipedia be extracted - or just redisplayed using whatever new and better display template comes along?

I don't know if this can be added as a "bug" in and of itself, but it it does point out the fundamental problem. MediaWiki has text, graphic, audio, and video formats, but is missing the ability to parse certain other critical basic information storage formats that the developers never considered.

bzimport added a comment.Via ConduitNov 21 2010, 4:34 AM

matthew.britton wrote:

(In reply to comment #106)

"There is community consensus to enable StringFunctions; if the developers do
not enable it themselves, the community hereby requests that the WMF instruct
the developers to do so."

That's not really how it works. The developers *are* WMF, or at least a subset thereof. (Or were you under the impression that volunteer devs opinions' mattered in such cases? LOL)

Mr.Z-man added a comment.Via ConduitNov 21 2010, 6:23 AM

(In reply to comment #116)

It isn't very difficult to parse a Newick format string and create a basic tree
display template from it. However, all this really would need is a full set of
string functions. It's true that PHP and MediaWiki wasn't designed to be a kind
of parser or compiler, but what sort of alternative can anyone think of?
Should we put in a request for MediaWiki developers to support the Newick
format, and any number of other important display-independent representations
of data widely used in Wikipedia?

...

I don't know if this can be added as a "bug" in and of itself, but it it does
point out the fundamental problem. MediaWiki has text, graphic, audio, and
video formats, but is missing the ability to parse certain other critical basic
information storage formats that the developers never considered.

This is kind of the main argument against string functions. Letting users create parsers in wikitext is pretty much exactly the kind of thing that those against it want to avoid. Wikitext is not supposed to be a programming language.[1] This is also a good example of what Aryeh was talking about in comment #110.

A well-defined language that has applications in thousands of pages is an excellent candidate for something that should be handled by an extension.

Who decides, and who does the work?

The same person who decides whether or not to enable string functions would decide to enable a Newick extension. Anyone who knows PHP can do the work.

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2009-June/043609.html

bzimport added a comment.Via ConduitNov 21 2010, 8:08 AM

nykevin.norris wrote:

(In reply to comment #117)

(In reply to comment #106)
> "There is community consensus to enable StringFunctions; if the developers do
> not enable it themselves, the community hereby requests that the WMF instruct
> the developers to do so."

That's not really how it works. The developers *are* WMF, or at least a subset
thereof. (Or were you under the impression that volunteer devs opinions'
mattered in such cases? LOL)

The non-volunteer devs *work for* the WMF. If the WMF decides to listen to the community (and that's a big if), I don't think the devs can reasonably say no.

What's more, the developers are primarily responsible for making functionality the community wants available to the community. They aren't doing that here, and that's a Bad Thing.

bzimport added a comment.Via ConduitNov 21 2010, 11:44 AM

happy.melon.wiki wrote:

(In reply to comment #119)

The non-volunteer devs *work for* the WMF. If the WMF decides to listen to the
community (and that's a big if), I don't think the devs can reasonably say no.

What's more, the developers are primarily responsible for making functionality
the community wants available to the community. They aren't doing that here,
and that's a Bad Thing.

You're confusing developers (who write code for new features) with sysadmins (who manage the servers and turn features on and off). The developers are their own community around their own project: the MediaWiki software. That community is structured slightly differently to a wiki community (there is a clear hierarchy of authority and other different ways of doing things) but fundamentally it is a volunteer project like any of the WMF's others: developers code things that interest them. Most developers work on areas of MediaWiki which will be of use on Wikimedia wikis, as seeing their code in action on the world's 6th largest website is the most tangible reward for their time, but neither the paid nor unpaid devs are beholden to the other WMF communities (and please remember that enwiki is just one of 800 such groups); any more than one wiki community is beholden to another. Many developers work on parts of MediaWiki which will never be installed on Wikimedia wikis. To say that ""the developers are primarily responsible for making functionality the community wants available to the community"" is arrogant and false.

The *sysadmins*, most (but not all) of whom are also active developers, are the ones who decide which components of MediaWiki are installed on WMF wikis. There is a strict hierarchy amongst sysadmins, and most of them are WMF paid staff. They *are* expected to take the communities' sentiments into account when making changes, and they are indeed accountable to the Foundation. The sysadmin you're talking about here reports directly to the Foundations' CTO; the CTO reports to the CEO, and the CEO reports to the board. The sysadmin who has made this decision is 'above' 90% of the Foundations' paid staff in the organisational hierarchy. Where, exactly, are you planning to go to get this decision overturned?

bzimport added a comment.Via ConduitNov 21 2010, 12:00 PM

catlow wrote:

Where, exactly, are you planning to go to get this

decision overturned?

Rather than initiate some kind of power battle, I think we ought simply to politely draw the sysadmin's attention to this discussion and the apparently strong arguments in favour of changing this decision, and hope that he'll now be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp user page, though others may know of more effective ways of giving him a friendly poke.)

MaxSem added a comment.Via ConduitNov 21 2010, 12:42 PM

(In reply to comment #121)

Rather than initiate some kind of power battle, I think we ought simply to
politely draw the sysadmin's attention to this discussion and the apparently
strong arguments in favour of changing this decision, and hope that he'll now
be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp
user page, though others may know of more effective ways of giving him a
friendly poke.)

Thinking that he doesn't know about this bug or that he is not watching it is way too naive, so all your pokes do nothing but annoyance.

bzimport added a comment.Via ConduitNov 22 2010, 11:22 PM

jsimlo wrote:

(In reply to comment #122)

> If it's Tim Starling, then I've already left a note on his en.wp user page,
Thinking that he doesn't know about this bug or that he is not watching it is
way too naive, so all your pokes do nothing but annoyance.

Actually, based on my experience with other big projects I (used to) be part of, this bug reads 123 comments as of right now. My humble guess is that Tim no longer bothers to read this bug, probably has it on his ignore list for a long time already. And I'd fully understand him. The decision has been made (I hope it was not taken lightly) and none of the above changes that (though it pollutes what should have been a technical discussion). The only reason I still read this bug is that it is getting funny, and not because I am interested in it as a dev..

jsimlo

ps. Yes, this comment also pollutes this bug. But I simply no longer see any cons of doing it.. :)

bzimport added a comment.Via ConduitNov 23 2010, 5:13 AM

catlow wrote:

none of the above changes that

It should change it really, as we now know that (a) there is continuing user demand for this functionality (b) nothing is happening or likely to happen towards providing it in any other sensible way than the one proposed (c) the use of the very inefficient workarounds without ill effect, the use of the proposed functions on Wikia, etc. prove that this functionality will not (as feared) damage performance. Presumably sysadmins don't have completely closed minds, and are capable of listening to users and arguments and taking a second look at past decisions...

MZMcBride added a comment.Via ConduitNov 23 2010, 6:16 AM

(In reply to comment #124)

It should change it really, as we now know that (a) there is continuing user
demand for this functionality (b) nothing is happening or likely to happen
towards providing it in any other sensible way than the one proposed (c) the
use of the very inefficient workarounds without ill effect, the use of the
proposed functions on Wikia, etc. prove that this functionality will not (as
feared) damage performance. Presumably sysadmins don't have completely closed
minds, and are capable of listening to users and arguments and taking a second
look at past decisions...

Hahahahaha

You're obviously not very familiar with Wikimedia's software development processes. Right now, some of this ParserFunctions mess (and its use in high use templates like "Template:Cite") cause page renderings to take upward of 30 seconds on a large article. And still nobody cares.™ If you think a bit of whining (or is it whinging?) in bug comments or attempting to rally some folks on a village pump is going to push anything forward, you're insane. You'd be better off trying to raise some money for a grant, to be honest. (Though not really; Wikimedia is apparently trying to stop accepting money with strings attached.)

If you wrote an extension that implemented JavaScript into MediaWiki templates that also doubled as donation-related software, you might be able to attract some attention to this bug before the 12th of Never. ;-) Otherwise, it's probably best to save your energy for battles you can possibly win.

bzimport added a comment.Via ConduitNov 23 2010, 7:29 AM

catlow wrote:

Don't see any reason for the negativity and sarcasm concerning this bug (as in comment above) - it's just a perfectly normal and well-reasoned feature request, which will actually *reduce* these page-rendering times you mention, and will hopefully be considered on its technical merits.

bzimport added a comment.Via ConduitNov 23 2010, 10:42 PM

jsimlo wrote:

(In reply to comment #126)

Don't see any reason for the negativity and sarcasm concerning this bug
(as in comment above) - it's just a perfectly normal and well-reasoned
[...] and will hopefully be considered on its technical merits.

Simply put: I do see one. No, it is not. I thought it was back then.

The long story short: I've developed these StringFunctions (not all by myself of course, there were subsequently three of us:) because I needed them back then in my own wikies. Then someone started this bug and Tim said no. Then we, out of interest, tried to optimize the extension to be more "suitable" for wikimedia cluster. And again, Tim said no. Then someone managed to merge StringFunctions into ParserFunctions, which were/are installed on wikimedia cluster. And guess what happend: Tim said no. If it ain't clear already, Tim had his chance to reconsider.

The only thing left now is: Let it go. The more comments are posted into this bug, the more it becomes and unusable kid chat wall. No developer is probably going to invest into reading thru a hundred of comments, even if a nugget of gold was lost somewhere within. ...Ahh, who am I kiddin? This attempt of explanation is pointless anyway..

Dinoguy1000 added a comment.Via ConduitNov 24 2010, 2:45 AM

I've filed bug 26092 for *some* form of string parsing functionality to be enabled on WMF wikis, could we please maybe try to keep from turning it into the same mess this bug is (i.e. if you have something *useful* to contribute, by all means do, but if not, no comments saying "we need this soooo bad, the devs aren't being [fair/reasonable/humane/etc]")?

Not sure if this bug should be marked as blocking it, but it probably doesn't matter anyways since this one is closed.

bzimport added a comment.Via ConduitNov 24 2010, 7:56 PM

jsimlo wrote:

(In reply to comment #128)

I've filed bug 26092 for

Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of luring someone into creating a brand new bug as a copy of this one. Despicable me, sorry about that.. :) Of course this solves nothing, but right now I am $50 richer! And all it took was mentioning the devs' reluctancy to read some hundred of comments.. :)))))

ps. Perhaps I should be banned for disrupting, but it was worth it.

Dinoguy1000 added a comment.Via ConduitNov 24 2010, 11:48 PM

(In reply to comment #129)

(In reply to comment #128)
> I've filed bug 26092 for

Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of
luring someone into creating a brand new bug as a copy of this one. Despicable
me, sorry about that.. :) Of course this solves nothing, but right now I am $50
richer! And all it took was mentioning the devs' reluctancy to read some
hundred of comments.. :)))))

ps. Perhaps I should be banned for disrupting, but it was worth it.

Actually, I'd been thinking about it for a while, I just finally decided to stop being lazy and do it already. =)

bzimport added a comment.Via ConduitDec 29 2010, 8:09 AM

cnit wrote:

(In reply to comment #99)

(In reply to comment #98)
> MZ, are you seriously suggesting that the developers will completely
> re-implement an extension, when the concerns about the original are *not*
> implementation-specific? I seriously doubt that.

I'm suggesting that the sysadmins in charge of running Wikimedia wikis have
said rather unequivocally that this extension is not going to be installed. The
StringFunctions extension is a means to an end. There are plenty of other ways
to implement string manipulation. For years, there has been discussion of
implementing a proper programming language into MediaWiki. The current
preferred favorite is not Lua, but JavaScript, actually.

If JavaScript is the language of choice, there is PHP SpiderMonkey extension. It still is not absolutely stable (only a beta), however I know that some WMF programmers are good in C, so it is probably possible to make few fixes. The question is, how to make these scripts run at "ordinary" hosters, where there will be no such PHP extension. In such case, one might try client-side JavaScript (in browser), however passing of function / template parameters from server side to client side might become too inefficient. Perhaps one might limit the JS language features to basic subset. Then to run it through PHP mod, when available, slowly interpret in PHP otherwise. Co-location (where you can compile and install PHP mod yourself) have become more affordable in last years, anyway.

bzimport added a comment.Via ConduitDec 29 2010, 9:05 AM

cnit wrote:

The mod can also register PHP classes in JS:
http://devzone.zend.com/article/4704

There is also interesting JavaScript-based server Jaxer:
http://jaxer.org/

It allows to share a lot of server-side and client-side code. For example, it allows to run server-side jQuery. Things like parsers could be written in JavaScript then used at both sides, thus minimizing the code duplication.

MZMcBride added a comment.Via ConduitDec 29 2010, 9:10 AM

(In reply to comment #132)

These are interesting, yes. However, these comments are really outside the scope of this bug. File a separate bug (if there isn't one already) or start a thread on the wikitech-l@lists.wikimedia.org mailing list if you're interested in further discussion about this.

MarkAHershberger added a comment.Via ConduitSep 24 2011, 6:02 PM
  • Bug 31136 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitNov 17 2011, 1:44 PM

daniel.a.r.werner wrote:

One argument brought up a few times, against string functions, that people would always go to the limits of whats possible in template programming and just write more complicated templates with string functions enabled might be true. So why not simply scale down the limits after installing these functions?
Existing string templates can be re-written as wrappers for using string functions, functionality wouldn't even be broken, we would have lower limits for whats possible using templates and functions but we would have more powerful and sane functions provided. They could be used in a sane way as they are being used right now with less load on the servers.

bzimport added a comment.Via ConduitNov 17 2011, 4:21 PM

happy.melon.wiki wrote:

(In reply to comment #135)

So why not simply scale down the limits after installing these functions?
Existing string templates can be re-written as wrappers for using string
functions, functionality wouldn't even be broken, we would have lower limits
for whats possible using templates and functions but we would have more
powerful and sane functions provided. They could be used in a sane way as they
are being used right now with less load on the servers.

Template limits are not just hit using string functions, indeed they're not even the major cause. The citation templates used on a large article consume much more of the template resources than string functions, as well as stupid things like the innumerable {{SubatomicParticle}} calls (and their endless subtemplates) on [[List of baryons]] etc. Reducing the template limits would break all these cases, and they're not scenarios which could be 'fixed' with proper string functions.

Skalman added a comment.Via ConduitNov 17 2011, 4:28 PM

A solution would be to define how expensive a parser function is, and set the string functions as "expensive" while not changing anything else. That way, other parser functions would work as they currently do, while we get the power of string functions, just that you can't use so many.

(I think there is already something like this in place already for some parser functions - not sure though)

Rich_Farmbrough added a comment.Via ConduitDec 30 2011, 5:07 PM

There is, but we have seen no evaluation of "expensive" - result is that stuff that is essential is split over several pages...

Cite templates would benefit enormously from parser functions, instead of jumping through hoops, simple tests can be made about whether something has a full stop at the end already or not.

The bug should be changed form WONTFIX, keeping it at that status because of a stray comment on a mailing list years ago, when at Wikimania 2011 Tim was undecided as to which solution (parser functions, scripting language or Victor's extension) was best.

Really I have looked at all three, ANY ONE WILL DO. And if you change your mind from parser functions to one of the others, I WILL PERSONALLY MIGRATE ALL TEMPLATES TO THE NEW SOLUTION.

I am re-opening this bug. Please do not casually re-close it.

bzimport added a comment.Via ConduitDec 31 2011, 3:40 AM

ted_kandell wrote:

Finally, some common sense here.

There are a huge number of templates that now do pretty much everything. My personal interest is in displaying trees and phylogenies. These are incredibly hard to edit now, not even worth it. I've tried to edit genealogical trees, and have given up, because the "presentation" is mixed up with the data. My browser would crash before I could even get part of it right by repeated experimentation.

"Expensive"? All of these "hoops" that everyone has to go though to validate templates without any sort of parser functions really has a collective impact on MediaWiki and Wikipedia. "NO solution" is much much worse than a an attempt at a "bad solution".

I don't think anyone even realizes the *lack* of editing by knowledgeable people that is taking place, because of the sheer difficulty in editing data that is not text or inline images. There's a price here, and it isn't whether "this or that implementation of trim()" regular expressions is more or less efficient.

It's been 5 1/2 years since this bug was first opened.
Maybe someone can get moving on it before a decade has passed?

DanielFriesen added a comment.Via ConduitDec 31 2011, 4:11 AM

Are string functions "really" the solution to the difficulty of editing specialized data.

To me that sounds like a really horrible solution that won't actually solve the issue. If data really is complex then string functions sound like something that will only allow a change to 'another' string based data format that will still be too complex for the knowledgeable people to edit.

I'd like to see some of those complex data formats. I'm pretty sure that for the most of them the real optimal thing they need is specialized code in a proper programming language to create a format that knowledgeable people can actually understand. And perhaps even add in a ui to make that possible.

bzimport added a comment.Via ConduitDec 31 2011, 4:34 AM

jim wrote:

String functions certainly are a solution to the problem that brought me here - attempting to construct a template to create the slightly unusual URLs used by an external site, which requires replacing each instance of a non-alphanumeric character by an underbar. Easy to do with {{#replace:}}

I was horrified to discover that a perfectly good solution has been implemented but its activation is being blocked for reasons I still cannot understand.

Bawolff added a comment.Via ConduitDec 31 2011, 4:55 AM

This is pointless. Can we stop beating the dead horse already?

bzimport added a comment.Via ConduitDec 31 2011, 5:31 AM

ted_kandell wrote:

Yes, string functions they are *a* solution, that can work right now.
Why? How would you implement parsing of say a Newick file, or any specialized data format that you didn't know about yourself, beforehand?

There are hundreds of such data formats. Some may be very useful for common sorts of representations in Wikipedia. Will we have to open a bug for each and every one, then hardcode a parser for it, then have someone update that parser whenever a slight change in the format comes out? Or would you rather just implement AJAX and Java instead?

BTW, how complex is it to parse a phylogenetic tree format which merely uses nested parentheses, and then display it, when these can be copied from anywhere?
http://en.wikipedia.org/wiki/Newick_format

The point is that often data in these specialize formats *already exists* out there, somewhere, and just needs to be displayed.

If you mean "stop beating a dead horse and just release these functions" I say yes. But if you mean "stop asking for them, you'll never ever get them, forget it ... "

bzimport added a comment.Via ConduitDec 31 2011, 5:38 AM

ted_kandell wrote:

Examples?
Here is the complete grammar for the "complex specialized" Newick format:

The grammar rules

Note, "|" separates alternatives.

Tree --> Subtree ";" | Branch ";"
Subtree --> Leaf | Internal
Leaf --> Name
Internal --> "(" BranchSet ")" Name
BranchSet --> Branch | BranchSet "," Branch
Branch --> Subtree Length
Name --> empty | string
Length --> empty | ":" number

Examples:

(,,(,)); no nodes are named
(A,B,(C,D)); leaf nodes are named
(A,B,(C,D)E)F; all nodes are named
(:0.1,:0.2,(:0.3,:0.4):0.5); all but root node have a distance to parent
(:0.1,:0.2,(:0.3,:0.4):0.5):0.0; all have a distance to parent
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5); distances and leaf names (popular)
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F; distances and all names
((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A; a tree rooted on a leaf node (rare)

bzimport added a comment.Via ConduitDec 31 2011, 5:40 AM

ted_kandell wrote:

Here is an example of a current genealogical tree, using templates:

http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

Généalogie

<center>
{{Arbre généalogique/début|style=font-size:75%;}}
{{Arbre généalogique | SAM | | | | | | | | | | | | | | |RSH| | | |SAM=Samuel|RSH='''Rachi ([[1040]]-[[1104]])'''}}
{{Arbre généalogique | |!| | | | |,|-|-|-|-|-|-|-|v|-|-|-|^|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|.}}
{{Arbre généalogique |SMH| | |RHL|-|AZR| |KVD|v|RMB| | | | | | | | |SMA| | |MRM|v|YBN||SMH=Simha ben Samuel de Vitry| RHL=Rachel ''Bellassez''| AZR=Eliézer ''Jocelyn'' | KVD=Yokheved | RMB=Meïr ben Samuel | MRM=Myriam | YBN=Judah ben Nathan | SMA=Shémaiah}}
{{Arbre généalogique | |!| | | |,|-|-|-|v|-|-|-|v|-|-|^|-|-|-|-|v|-|-|-|.| | | |!| | | | |,|-|^|-|.}}
{{Arbre généalogique |SAM|v|HAN| |SLM| |RTM|v|MRM| |RVM| |SBM|v|INC| | |YTV| |AZR|SAM=Samuel de Vitry|HAN=Hanna| SLM=Salomon |RTM=[[Rabbénou Tam]] (~[[1100]]-[[1171]])|MRM=Myriam |RVM=Isaac Rivam|SBM=Samuel [[Rashbam]] (~[[1085]]-[[1158]])|INC=?|YTV=Yom Tov de Falaise|AZR=Eléazar }}
{{Arbre généalogique | | | |!| | | | | |,|-|-|-|v|-|^|-|v|-|-|-|.| | | | | |!| | | |,|-|-|^|-|-|-|.}}
{{Arbre généalogique | | |RI| | | |ITS| |SLM| |MSH| |ISF| | | |ITS | |YHD| | | | |ISF|RI=[[Isaac ben Samuel de Dampierre|Isaac de Dampierre]] dit le Ri (~[[1120]]-[[1195]])|ITS=Isaac|SLM=Salomon|MSH=Moïse|ISF=Joseph| YHD=Judah | ITS=Isaac}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | | |,|-|-|^|.| | | |,|-|^|.}}
{{Arbre généalogique | | |HNN| | | | | | | | | | | | | | | | | | | | | | |ITS| |AZR|v|BLA| |LAH|HNN=Elhanan (mort [[1184]])|ITS=Isaac|AZR=Eléazar | BLA=Bila | LAH=Léah}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |!| | | | | }}
{{Arbre généalogique | | |SML| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |YHD| SML=Samuel|YHD=Judah de Paris Sir Léon ([[1166]]-[[1224]])}}
{{Arbre généalogique/fin}}
</center>

bzimport added a comment.Via ConduitDec 31 2011, 5:48 AM

ted_kandell wrote:

http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

In the above tree, I need to add a father for Shémaiah and make Shémaiah the father of Eliézer Jocelyn.

That should be a simple change using the above templates, right?

No.

"Lack of string functions"

In the Newick format it would take 1 second, and there are tools to create and edit such files.

Now think of the hundreds of other easy-to-parse useful standard data formats ...

DanielFriesen added a comment.Via ConduitDec 31 2011, 6:10 AM

This Newick format and that genealogy stuff look like a perfect example of what string functions will NOT solve.

String functions are for simple text replacements and tests. What are you going to do, write a whole Newick parser in string functions? If, and that's a big if given that we don't have variables inside WikiText, you can manage to implement Newick parsing inside of a template. That template is going to be insanely complex, trying to make minor tweaks to the template which would be sane in a normal programming language are going to become so hard it's nearly impossible. And to top it off that template is going to be so heavy that it slows down parsing for every page you use it on (multiplied by how much you use it and how much data you input).

If we have a use for it, then what it sounds like we could use, if we actually have a use for it, would be a real Newick parser. Just as for whatever other formats there are for things that are in fact useful to Wikipedia. Yes there are hundreds of formats, but when we talk about Wikipedia and implementation we only care about the ones that will output things we want on Wikipedia, and within that only the few formats we actually need. We don't have to implement parsing for dozens of formats that do the same thing when there's one format most people can use that'll work.
But I would also like to make the point that what I see as the output of those genealogy I can't consider acceptable. It's horrible, absolutely disgusting. A complete abuse of html tables in a presentational way. I don't want to see a new template that outputs the same garbage. Not only do those need a better system of inputting the information, they need a better output. Something you can't do in templates because it likely involves building a .svg or something.

From what I see of Newick and your example your argument also falls short. Newick seams to describe trees that only branch outwards. But that genealogy tree appears to re-connect at various points. In other words, it looks like your example tree actually CAN'T be expressed in Newick.

Frankly, it looks like you could use DOT. Wonder what happened to graphviz in all this.

bzimport added a comment.Via ConduitDec 31 2011, 6:11 AM

john wrote:

I'm reclosing as WONTFIX.

It's very clear that we're going to have a new solution in the next year to handle these situations. Whether it be Lua, built in Javascript or an extension to handle cite templates. Whatever the fix is, I think the developers have made a point that string functions simply won't be enabled.

Therefore, the bug's original request of setting $wgPFEnableStringFunctions = true on Wikimania wikis will not happen. Hence, WONTFIX.

Please don't change this unless you are a developer.

(In reply to comment #142)

This is pointless. Can we stop beating the dead horse already?

Agreed.

Rich_Farmbrough added a comment.Via ConduitFeb 24 2012, 3:35 PM

It's not about cite templates alone. And I'm glad you say "we're going to have a new solution in the next year" - Lua has been committed to but it was also the proposed solution back in 2009, and I think it's rather "I will believe it when I see it".

I don't think injunctions like "Please don't change this unless you are a developer." are very cool. It is quite possible that Lua will be decided against (as it was before) and then this should be re-opened.

And the previous "WONTFIX - please don't change" was predicated on a stray remark by Tim Starling in a mail list. At WikiMania 2011 Tim changed his mind several times on the best solution including Lua, parser functions, Victor's scripting extension.

Or maybe we should change this bug to "provide some form of string handling, and soon" because otherwise we might have Lua kicking around for another 6 years and still be no further forward.

bzimport added a comment.Via ConduitFeb 24 2012, 3:57 PM

happy.melon.wiki wrote:

(In reply to comment #149)

I don't think injunctions like "Please don't change this unless you are a
developer." are very cool. It is quite possible that Lua will be decided
against (as it was before) and then this should be re-opened.

Those two comments are in no way exclusive: a developer would be best placed to know if any change occurs to the commitment to Lua. Although as has been said before, the existence of an alternative is not a prerequisite for WONTFIXing this.

Or maybe we should change this bug to "provide some form of string handling,
and soon" because otherwise we might have Lua kicking around for another 6
years and still be no further forward.

That would be bug 26092, of which this bug is a dependency. Everyone knows that this is an open, important and complicated issue; if changing the title of a bug were all it took to magically untie the gordian knot, we would have done it by now.

vvv added a comment.Via ConduitFeb 25 2012, 11:09 AM

(In reply to comment #149)

I don't think injunctions like "Please don't change this unless you are a
developer." are very cool. It is quite possible that Lua will be decided
against (as it was before) and then this should be re-opened.

No, proper scripting language is certainly preferred to string functions in wikitext and I cannot imagine what must happen so we reconsider this.

And the previous "WONTFIX - please don't change" was predicated on a stray
remark by Tim Starling in a mail list. At WikiMania 2011 Tim changed his mind
several times on the best solution including Lua, parser functions, Victor's
scripting extension.

Lua is an almost-final choice, made by consensus of WMF developers. Even if we change the language, the current plan is to develop infrastructure which is language-independent (so we can just plug in a different language backend without rewriting anything else).

Or maybe we should change this bug to "provide some form of string handling,
and soon" because otherwise we might have Lua kicking around for another 6
years and still be no further forward.

It's WMF engineering project now, and as far as I am aware the active work on it should begin shortly after 1.19 deployment and git migration.

bzimport added a comment.Via ConduitFeb 27 2012, 7:55 AM

questpc wrote:

Victor, I am off from my extension's developing due to various problems, however may I ask you to give an address of the page where the scripting project status will be updated, please? One of my extensions already needs strong scripting language and I wonder whether it is already possible to hook / bind Lua calls in separate MW extension. Basically, I need to have few custom Lua functions bound to PHP methods in extension's code and the possibility to execute Lua scripts which use these function calls.

He7d3r added a comment.Via ConduitFeb 27 2012, 9:56 AM

(In reply to comment #152)

Victor, I am off from my extension's developing due to various problems,
however may I ask you to give an address of the page where the scripting
project status will be updated, please?

I think that would be
https://www.mediawiki.org/wiki/Lua_scripting/status

MZMcBride added a comment.Via ConduitMar 25 2013, 7:01 AM

Just noting here in a comment that bug 26092 ("Enable or install string parsing wikimarkup functionality on WMF wikis") has now been marked resolved/fixed. Wikimedia wikis now all have proper string functions (but not StringFunctions) via [[mw:Extension:Scribunto]] and [[mw:Lua]]. :-)

Thanks to Brad Jorsch, Tim Starling, Victor Vasiliev, and many others (including the template writers who are now embarking on a massive upgrade) for all of their past, present, and future work on this. It seems we're now on the cusp of greatly reducing parse times of pages, which is really awesome.

Related bugs:

  • bug 26786 – Add functionality (in an extension or MediaWiki) and implement to make English Wikipedia's [[Template:Cite]] work faster
  • bug 19262 – Pages with a high number of templates suffer extremely slow rendering or read timeout for logged in users
mxn removed a subscriber: mxn.Via WebNov 24 2014, 8:58 PM
mxn added a subscriber: mxn.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.