Page MenuHomePhabricator

Form inputs need to reject problem characters
Closed, DeclinedPublic

Description

Described here:

http://old.nabble.com/Semantic-Forms%3A-Need-better-input-checking.-tt32128157.html

These characters cause problems on form inputs, and need to be rejected:

#<>[]|{}


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=34486
https://bugzilla.wikimedia.org/show_bug.cgi?id=34613

Details

Reference
bz30042

Related Objects

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 21 2014, 11:50 PM
bzimport set Reference to bz30042.

Actually, I believe it's only the "|" that's problematic. Still, there should be Javascript validation for it.

And probably "}}" without (or with unbalanced) opening braces.
Are there other cases?

Oh, and probably be careful not to reject pipes inside correctly balanced braces. There will be people who want to be able to insert wikitext into fields.

Oh yeah, "{{" and "}}". And it's true that pipes within parser functions and the like can be fine, in theory, although I think SF handles these incorrectly, the next time it goes to edit the page. Or maybe that bug has been fixed? I should really test this stuff out...

Just tried it. Seems to be fixed.

Oh, that's great to hear. Although it seems to make this feature request a good deal more difficult to implement, since the Javascript would need to parse through the input string, to determine that pipes were always within double braces, and that double braces always got closed...

Do you have to reinvent the wheel on that, or is there a nice off the shelf parser that can do such a common task for you? It seems like I ran across one that was very sophisticated while still be very simple to use, but it was for PHP, not JavaScript.

If you find a Javascript one, let me know.

CakePHP had the best parser functions I found for PHP:

http://cakephp.org/

Maybe they could work for you? I think it can do some trickery with forms and JavaScript too, so it might be worth a look. This might be a good place to start:

http://book.cakephp.org/view/1143/Data-Validation

I have never used CakePHP proper, since what I do is much too simple to need the whole thing. But, I did once find the a very good text truncation algorithm that I needed, and I was able to use that as a pattern for my own project.

It has some fancy AJAX functions too, so maybe it'll work that way, instead of with JavaScript.

I may be off-base with this suggestion, but 9 times out of 10 I find the perfect solution on the 10th try :)

Just had a quick look. None of their pre-made rules checks wikitext. Custom-made rules can either be given as regexps or as functions. But for that we do not need a framework.

I think if you want to do this right, you will end up halfway into building a wikitext parser. I wonder if the MW parser could be employed somehow.

That sounds like a promising avenue of investigation. But, the Mediawiki parser does not parse nested tags correctly, so it's kind of buggy on its own:

https://bugzilla.wikimedia.org/show_bug.cgi?id=29889

Still, it might be the basis for a solution, even if it has to be improved also...unless another approach turns out to be more expedient.

van.de.bugger wrote:

Hi guys,

I see you start thinking about quite complicated stuff like using parser functions and templates within an input field...

Please wait for a moment. I think usage of these complicated constructs is very limited, it will be needed in very rare cases. In 99% of cases input field serves for entering simple strings like "John, Dow" or "1100 Fox Drive", or "yellow".

Let us implement very strict rules: characters {|} (and, probably, []) are rejected. This strict rule will be a "right thing" in 99% of cases. Then, if someone requests allowing complicated stuff (like using parser functions or templates), it can be implemented by a parameter, e. g.

{{{field|Name|input type=text|validate=off}}}

You can do that already using input type regexp, I think: http://www.mediawiki.org/wiki/Extension:Semantic_Forms_Inputs#Regular_expression_filter.

Needs the SFI extension, though.

Validation is something that should be built-in if the consequences are severe enough.

dan.bolser wrote:

about the above... can you just make an api query to pass the value to 'expand templates'? ... Hmm... can't find that just now, but in general, the JS can call the MW api to check the values for 'valid' wiki text?

I think using the existing parser (if possible) via a JS api call is much better than building a JS MW parser from scratch (or even using an existing framework, because the MW syntax may evolve over time).

I agree that, in the first instance, just banning pipes and braces is enough.

van.de.bugger wrote:

Patch for rejecting characters: {|}

This is a simple patch for rejecting characters by javascript code.

attachment SemanticForms-better-input-checking.patch ignored as obsolete

sumanah wrote:

Hope the SMW folks don't mind -- marking this with the "patch" & "need-review" keywords to signal that Van contributed a patch here. Thanks, Van.

Patch looks valid. One remark though: it seems to disallow any occurrence of { and }, while I think it only needs to disallow {{ and }}.

As Van already stated, this is a simple patch. IMHO it is too simple.

It rejects any input that has a { or | or }, although this might be perfectly valid. I agree that for normal text inputs usage of these characters is unusual. The situation is probably different for textareas.

What makes it unacceptable for me is, that it changes the behavior of SF so that old forms may not work as expected with new versions of SF. If you are really intent on implementing this, introduce a parameter in input definitions that needs to be set to turn this validation ON and is OFF by default.

For me this whole thing is an enhancement, not a bug.

I apologize if this came over harsh. I do appreciate improvements of SF and I do agree that there is a problem here. But if possible we should avoid implementing solutions that leave people out in the rain, even if these people are only 1% of the users and it would help the other 99%.

van.de.bugger wrote:

@Sumana Harihareswara: Thanks for attracting attention to the problem.

it seems to disallow any occurrence of { and }, while I think it only needs to
disallow {{ and }}.

Hmm… May be. Is it the only issue? Will attach new patch soon.

It rejects any input that has a { or | or }, although this might be perfectly
valid. I agree that for normal text inputs usage of these characters is
unusual. The situation is probably different for textareas.

Proposed patch does not affect textareas.

Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will interpret them as template/magic word/parser function invocation? We already talked about it.

Currently editing a page trough form is "fragile" -- a user editing a page through a form can easily break code. The page will be displayed incorrectly, and even editing the page through the form will not work as expected -- some values will be lost. An experienced user can easily fix the problem by editing the page source, but an inexperienced user will be very frustrated. To me this is unacceptable. Forms must not accept invalid input.

I implemented a simple patch, so user will be warned about invalid characters. Yes, users can not use templates, magic words and other stuff in form fields. But do you really think all this stuff is actually required in forms? If somebody wants to use templates, it means this person is aware about template syntax, and, therefore, can relatively easily edit pages directly.

Having a script which can recognize perfectly valid wikitext code means somebody should reimplement Mediawiki parser (~11000 lines of PHP code) in JavaScript. Will you? I won't.

van.de.bugger wrote:

Patch for rejecting {{, |, }}; v2.

Attached:

I agree with Van that this issue is important, and not just an enhancement. Forms ideally should make things simple for anyone and everyone, by eliminating the kinds of manual code editing that will be required in some cases without fixing the issue.

However, if possible, let's not leave anyone out in the rain :) I think it is possible, so let's make sure nothing too unusual occurs once the patch is applied that might disrupt current forms that are in use.

(In reply to comment #20)

Proposed patch does not affect textareas.

Right. Sorry, did not read the code correctly.

Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will
interpret them as template/magic word/parser function invocation? We already
talked about it.

Yes, that's what I mean. And yes, we already talked about it. But no, there was no agreed way to go forward.

Forms must not accept invalid input.

But they must accept valid input. Especially if it was accepted before.

I implemented a simple patch, so user will be warned about invalid characters.

The user will not only be warned, they will be denied the usage of these characters.

Yes, users can not use templates, magic words and other stuff in form fields.
But do you really think all this stuff is actually required in forms?

Yes.

If somebody wants to use templates, it means this person is aware about template
syntax, and, therefore, can relatively easily edit pages directly.

That's exactly what I am saying: This patch implements something for the sake of the majority and in it's course reduces functionality for a minority. And there would even be a solution that does not involve rewriting the MW parser in JS: Introduce a parameter in input definitions
that needs to be set to turn this validation ON and is OFF by default. You even proposed something similar yourself, it could look like this:

{{{field|Name|input type=text|validate=on}}}

It would be even better if you implemented it in a way that other validators could be plugged in, e.g.

{{{field|Name|input type=text|validate=nobraces}}}

Somebody else could then contribute other validators, e.g. "telephone", "email"

Or you could implement a new input type that only accepts simple text, e.g.

{{{field|Name|input type=simple text}}}

Or you could parse the input text and ensure that braces are balanced, no need to completely recreate the MW parser.

Or you could actually send the values to the wiki for validation as Dan proposed.

Here's an example of a situation where it might be desirable to optionally enable these problem characters to be input (Login with Demo/test):

http://www.coincompendium.com/wiki/index.php?title=File:IMG_0629.JPG&action=formedit

Granted, it's being done in the textarea that's already excluded from this patch, but it could just as easily appear in other areas of the form, so is still a handy example. Notice two things:

  1. Lupo's ImageAnnotator uses template formatting. A future version of the ImageAnnotator might be made compatible with Semantic Forms (Note 0). That could raise the possibility that annotation data with curly braces and possibly pipes would appear in form field, perhaps automatically using ImageAnnotator's JavaScript.
  1. The form provides very simple bolded instructions that suggests users try doing a template substitution. Template substitutions are very easy ways to ask a form's users to do something unusual in a form field, without any effort. Almost anybody is capable of following such simple instructions in order to access sophisticated features that would otherwise be far too complicated for an ordinary user.

I like f.trott's idea for a validate parameter on form fields. That would allow form creators to tailor the form for their needs. Indeed, it opens the possibility of generalizing the validation code such that ANY type of validation can be done.

For example, a form field that inputs monetary amounts may require that the currency be selected in another field. By allowing only numbers, it will serve to prevent symbols from being entered. Here's an example of that exact use case, where the best I've been able to do is tell users to enter numbers only, and not to enter symbols:

http://www.coincompendium.com/wiki/index.php/Special:FormEdit/Sighting/CCS2

Note 0. If you're impressed by Lupo's ImageAnnotator, have a look here for the suggestion to incorporate it into an extension and then SMW:

https://www.mediawiki.org/wiki/User:Badon/Extension:Semantic_MediaWiki/Manual#Future_TODO

van.de.bugger wrote:

I believe that Forms *must* *not* generate invalid code. The simplest method to achieve this -- deny {{, |, and }}, which is now implemented and available as a patch. The patch fulfils my current needs. I do not object if somebody implements any of advanced validation techniques mentioned above.

Yaron_Koren claimed this task.
Yaron_Koren subscribed.

Changing to "Declined" - now that (as of a month ago) SF rejects pipes not within curly or square brackets, I think the overall validation might be good enough - certainly wholesale rejection like this does not seem to be required.