Form inputs need to reject problem characters
OpenPublic

Description

Described here:

http://old.nabble.com/Semantic-Forms%3A-Need-better-input-checking.-tt32128157.html

These characters cause problems on form inputs, and need to be rejected:

#<>[]|{}


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=34486
https://bugzilla.wikimedia.org/show_bug.cgi?id=34613

bzimport set Reference to bz30042.
Badon created this task.Via LegacyJul 24 2011, 11:27 PM
Yaron_Koren added a comment.Via ConduitJul 25 2011, 7:34 PM

Actually, I believe it's only the "|" that's problematic. Still, there should be Javascript validation for it.

Foxtrott added a comment.Via ConduitJul 25 2011, 8:18 PM

And probably "}}" without (or with unbalanced) opening braces.
Are there other cases?

Oh, and probably be careful not to reject pipes inside correctly balanced braces. There will be people who want to be able to insert wikitext into fields.

Yaron_Koren added a comment.Via ConduitJul 25 2011, 8:35 PM

Oh yeah, "{{" and "}}". And it's true that pipes within parser functions and the like can be fine, in theory, although I think SF handles these incorrectly, the next time it goes to edit the page. Or maybe that bug has been fixed? I should really test this stuff out...

Foxtrott added a comment.Via ConduitJul 25 2011, 8:39 PM

Just tried it. Seems to be fixed.

Yaron_Koren added a comment.Via ConduitJul 25 2011, 9:07 PM

Oh, that's great to hear. Although it seems to make this feature request a good deal more difficult to implement, since the Javascript would need to parse through the input string, to determine that pipes were always within double braces, and that double braces always got closed...

Badon added a comment.Via ConduitJul 25 2011, 9:43 PM

Do you have to reinvent the wheel on that, or is there a nice off the shelf parser that can do such a common task for you? It seems like I ran across one that was very sophisticated while still be very simple to use, but it was for PHP, not JavaScript.

Yaron_Koren added a comment.Via ConduitJul 25 2011, 9:52 PM

If you find a Javascript one, let me know.

Badon added a comment.Via ConduitJul 25 2011, 11:55 PM

CakePHP had the best parser functions I found for PHP:

http://cakephp.org/

Maybe they could work for you? I think it can do some trickery with forms and JavaScript too, so it might be worth a look. This might be a good place to start:

http://book.cakephp.org/view/1143/Data-Validation

I have never used CakePHP proper, since what I do is much too simple to need the whole thing. But, I did once find the a very good text truncation algorithm that I needed, and I was able to use that as a pattern for my own project.

It has some fancy AJAX functions too, so maybe it'll work that way, instead of with JavaScript.

I may be off-base with this suggestion, but 9 times out of 10 I find the perfect solution on the 10th try :)

Foxtrott added a comment.Via ConduitJul 26 2011, 11:03 PM

Just had a quick look. None of their pre-made rules checks wikitext. Custom-made rules can either be given as regexps or as functions. But for that we do not need a framework.

I think if you want to do this right, you will end up halfway into building a wikitext parser. I wonder if the MW parser could be employed somehow.

Badon added a comment.Via ConduitJul 27 2011, 3:58 AM

That sounds like a promising avenue of investigation. But, the Mediawiki parser does not parse nested tags correctly, so it's kind of buggy on its own:

https://bugzilla.wikimedia.org/show_bug.cgi?id=29889

Still, it might be the basis for a solution, even if it has to be improved also...unless another approach turns out to be more expedient.

bzimport added a comment.Via ConduitJul 27 2011, 6:28 PM

van.de.bugger wrote:

Hi guys,

I see you start thinking about quite complicated stuff like using parser functions and templates within an input field...

Please wait for a moment. I think usage of these complicated constructs is very limited, it will be needed in very rare cases. In 99% of cases input field serves for entering simple strings like "John, Dow" or "1100 Fox Drive", or "yellow".

Let us implement very strict rules: characters {|} (and, probably, []) are rejected. This strict rule will be a "right thing" in 99% of cases. Then, if someone requests allowing complicated stuff (like using parser functions or templates), it can be implemented by a parameter, e. g.

{{{field|Name|input type=text|validate=off}}}

Foxtrott added a comment.Via ConduitJul 27 2011, 6:41 PM

You can do that already using input type regexp, I think: http://www.mediawiki.org/wiki/Extension:Semantic_Forms_Inputs#Regular_expression_filter.

Needs the SFI extension, though.

Badon added a comment.Via ConduitJul 27 2011, 6:45 PM

Validation is something that should be built-in if the consequences are severe enough.

bzimport added a comment.Via ConduitOct 1 2011, 9:24 AM

dan.bolser wrote:

about the above... can you just make an api query to pass the value to 'expand templates'? ... Hmm... can't find that just now, but in general, the JS can call the MW api to check the values for 'valid' wiki text?

I think using the existing parser (if possible) via a JS api call is much better than building a JS MW parser from scratch (or even using an existing framework, because the MW syntax may evolve over time).

I agree that, in the first instance, just banning pipes and braces is enough.

bzimport added a comment.Via ConduitOct 3 2011, 7:33 PM

van.de.bugger wrote:

Patch for rejecting characters: {|}

This is a simple patch for rejecting characters by javascript code.

attachment SemanticForms-better-input-checking.patch ignored as obsolete

bzimport added a comment.Via ConduitNov 14 2011, 5:00 PM

sumanah wrote:

Hope the SMW folks don't mind -- marking this with the "patch" & "need-review" keywords to signal that Van contributed a patch here. Thanks, Van.

JeroenDeDauw added a comment.Via ConduitNov 15 2011, 9:03 PM

Patch looks valid. One remark though: it seems to disallow any occurrence of { and }, while I think it only needs to disallow {{ and }}.

Foxtrott added a comment.Via ConduitNov 15 2011, 9:48 PM

As Van already stated, this is a simple patch. IMHO it is too simple.

It rejects any input that has a { or | or }, although this might be perfectly valid. I agree that for normal text inputs usage of these characters is unusual. The situation is probably different for textareas.

What makes it unacceptable for me is, that it changes the behavior of SF so that old forms may not work as expected with new versions of SF. If you are really intent on implementing this, introduce a parameter in input definitions that needs to be set to turn this validation ON and is OFF by default.

For me this whole thing is an enhancement, not a bug.

Foxtrott added a comment.Via ConduitNov 15 2011, 10:19 PM

I apologize if this came over harsh. I do appreciate improvements of SF and I do agree that there is a problem here. But if possible we should avoid implementing solutions that leave people out in the rain, even if these people are only 1% of the users and it would help the other 99%.

bzimport added a comment.Via ConduitNov 16 2011, 12:11 AM

van.de.bugger wrote:

@Sumana Harihareswara: Thanks for attracting attention to the problem.

it seems to disallow any occurrence of { and }, while I think it only needs to
disallow {{ and }}.

Hmm… May be. Is it the only issue? Will attach new patch soon.

It rejects any input that has a { or | or }, although this might be perfectly
valid. I agree that for normal text inputs usage of these characters is
unusual. The situation is probably different for textareas.

Proposed patch does not affect textareas.

Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will interpret them as template/magic word/parser function invocation? We already talked about it.

Currently editing a page trough form is "fragile" -- a user editing a page through a form can easily break code. The page will be displayed incorrectly, and even editing the page through the form will not work as expected -- some values will be lost. An experienced user can easily fix the problem by editing the page source, but an inexperienced user will be very frustrated. To me this is unacceptable. Forms must not accept invalid input.

I implemented a simple patch, so user will be warned about invalid characters. Yes, users can not use templates, magic words and other stuff in form fields. But do you really think all this stuff is actually required in forms? If somebody wants to use templates, it means this person is aware about template syntax, and, therefore, can relatively easily edit pages directly.

Having a script which can recognize perfectly valid wikitext code means somebody should reimplement Mediawiki parser (~11000 lines of PHP code) in JavaScript. Will you? I won't.

bzimport added a comment.Via ConduitNov 16 2011, 12:12 AM

van.de.bugger wrote:

Patch for rejecting {{, |, }}; v2.

Attached: SemanticForms-braces-and-bar.patch

Badon added a comment.Via ConduitNov 16 2011, 6:17 AM

I agree with Van that this issue is important, and not just an enhancement. Forms ideally should make things simple for anyone and everyone, by eliminating the kinds of manual code editing that will be required in some cases without fixing the issue.

However, if possible, let's not leave anyone out in the rain :) I think it is possible, so let's make sure nothing too unusual occurs once the patch is applied that might disrupt current forms that are in use.

Foxtrott added a comment.Via ConduitNov 16 2011, 9:32 AM

(In reply to comment #20)

Proposed patch does not affect textareas.

Right. Sorry, did not read the code correctly.

Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will
interpret them as template/magic word/parser function invocation? We already
talked about it.

Yes, that's what I mean. And yes, we already talked about it. But no, there was no agreed way to go forward.

Forms must not accept invalid input.

But they must accept valid input. Especially if it was accepted before.

I implemented a simple patch, so user will be warned about invalid characters.

The user will not only be warned, they will be denied the usage of these characters.

Yes, users can not use templates, magic words and other stuff in form fields.
But do you really think all this stuff is actually required in forms?

Yes.

If somebody wants to use templates, it means this person is aware about template
syntax, and, therefore, can relatively easily edit pages directly.

That's exactly what I am saying: This patch implements something for the sake of the majority and in it's course reduces functionality for a minority. And there would even be a solution that does not involve rewriting the MW parser in JS: Introduce a parameter in input definitions
that needs to be set to turn this validation ON and is OFF by default. You even proposed something similar yourself, it could look like this:

{{{field|Name|input type=text|validate=on}}}

It would be even better if you implemented it in a way that other validators could be plugged in, e.g.

{{{field|Name|input type=text|validate=nobraces}}}

Somebody else could then contribute other validators, e.g. "telephone", "email"

Or you could implement a new input type that only accepts simple text, e.g.

{{{field|Name|input type=simple text}}}

Or you could parse the input text and ensure that braces are balanced, no need to completely recreate the MW parser.

Or you could actually send the values to the wiki for validation as Dan proposed.

Badon added a comment.Via ConduitNov 16 2011, 11:42 PM

Here's an example of a situation where it might be desirable to optionally enable these problem characters to be input (Login with Demo/test):

http://www.coincompendium.com/wiki/index.php?title=File:IMG_0629.JPG&action=formedit

Granted, it's being done in the textarea that's already excluded from this patch, but it could just as easily appear in other areas of the form, so is still a handy example. Notice two things:

  1. Lupo's ImageAnnotator uses template formatting. A future version of the ImageAnnotator might be made compatible with Semantic Forms (Note 0). That could raise the possibility that annotation data with curly braces and possibly pipes would appear in form field, perhaps automatically using ImageAnnotator's JavaScript.
  1. The form provides very simple bolded instructions that suggests users try doing a template substitution. Template substitutions are very easy ways to ask a form's users to do something unusual in a form field, without any effort. Almost anybody is capable of following such simple instructions in order to access sophisticated features that would otherwise be far too complicated for an ordinary user.

I like f.trott's idea for a validate parameter on form fields. That would allow form creators to tailor the form for their needs. Indeed, it opens the possibility of generalizing the validation code such that ANY type of validation can be done.

For example, a form field that inputs monetary amounts may require that the currency be selected in another field. By allowing only numbers, it will serve to prevent symbols from being entered. Here's an example of that exact use case, where the best I've been able to do is tell users to enter numbers only, and not to enter symbols:

http://www.coincompendium.com/wiki/index.php/Special:FormEdit/Sighting/CCS2

Note 0. If you're impressed by Lupo's ImageAnnotator, have a look here for the suggestion to incorporate it into an extension and then SMW:

https://www.mediawiki.org/wiki/User:Badon/Extension:Semantic_MediaWiki/Manual#Future_TODO

bzimport added a comment.Via ConduitNov 23 2011, 6:34 PM

van.de.bugger wrote:

I believe that Forms *must* *not* generate invalid code. The simplest method to achieve this -- deny {{, |, and }}, which is now implemented and available as a patch. The patch fulfils my current needs. I do not object if somebody implements any of advanced validation techniques mentioned above.

Add Comment