Page MenuHomePhabricator

Format constraint UX
Closed, ResolvedPublic

Description

Current screenshot of a format constraint violation:

image.png (275×924 px, 27 KB)


Quoting @Jan_Dittrich’s comment:

Note: In the current version, if the check is not satisfied, the user gets shown a regex.

Basic Problems:

  • We can't expect users to know regex (of our 5 example users/personas, only 1 or 2 know what it is.
  • Even if you know regex, they are hard to read even for experienced people

So, usability heuristics to apply here:

  • "Match between system and the real world" (we should use concepts familiar to the user)
  • "Consistency and standards" – our other constraint infos are pretty well to understand, this one is not
  • "Help users recognize, diagnose, and recover from errors: Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution."

For the latter, we don't satisfy any of the user needs. We should:

  • Say that it did not match [whateveritchecksfor], so like "The check for a URL failed."
  • say what the problem is: It seems that your url does not have an "https://" in the begin
  • suggest a fix, like "Try to add http:// or https:// in the beginning, if the URLs are otherwise correct"

So the error message would be: "Your input was checked and was not recognized as a URL"


Quoting @Esc3300’s comments (1, 2):

Maybe the qualifier "syntax clarification" could be displayed.

Sample from local dialing code (P473):

"string combining digits, spaces, - (All else excluded, such as: ,/;()+ )"

Alternatively, "string doesn't match expected form, see property constraint."

Event Timeline

@Jan_Dittrich your suggestions are very difficult to implement – figuring out how to “fix” a string so that it matches a regular expression is a hard problem. And we don’t have a short description of the constraint (“a URL”).

@Esc3300 I wasn’t aware of that property, sounds nice… but currently it’s only used on four “Wikidata property example” statements and 302 “format as a regular expression” statements, never on constraint statements (query). But I think it could make sense to display this qualifier in general, for any constraint type.

And we don’t have a short description of the constraint (“a URL”).

So, just to get it right, we would have a Regex, but it carries no context what it would do?

figuring out how to “fix” a string so that it matches a regular expression is a hard problem

Could we use capture groups and see which does (not) capture? E.g. see if the http: is captured?

we would have a Regex, but it carries no context what it would do?

Yup.

Could we use capture groups and see which does (not) capture?

I don’t see how – all we get back from the query service is “matches” or “doesn’t match”, and doing anything not on the query service opens us up to DoS attacks via expensive regexes.

we would have a Regex, but it carries no context what it would do?

Yup.

That sounds problematic to me, regexes are hard enough to read even with comments and context. For a hack one runs for oneself it is OK, but having it as a more central element in the infrastructure I think it will cause us problems.

Could we use capture groups and see which does (not) capture?

I don’t see how – all we get back from the query service is “matches” or “doesn’t match”, and doing anything not on the query service opens us up to DoS attacks via expensive regexes.

Is that a constraint-by-principle or something that the queray service could do but currently does not?

Is that a constraint-by-principle or something that the query service could do but currently does not?

A constraint by principle. We only have a REGEX() function that returns true or false.

To avoid that we miss the context of use here:

  • What are examples where the regex-based constraint would be used? One example seems to be the definition of formatter URLs (https://www.wikidata.org/wiki/Property:P1630). But where else would these be used
  • where are the regexes defined?

Basically, a spec that shows what the feature does for the user.

What are examples where the regex-based constraint would be used? One example seems to be the definition of formatter URLs

I’m not sure if formatter URL is a typical example, actually – that’s just what I used for my test wiki. A typical use seems to be identifiers of various kinds (n digits, or one of these five letters followed by a slash and n digits, etc.).

where are the regexes defined?

In “format as a regular expression” qualifiers of “property constraint” statements.

By the way, there’s documentation for each constraint type under Help:Property constraint portal, including for Format.

Some might have seen https://www.wikidata.org/wiki/Property_talk:P2302#Conversion_of_existing_properties : there are a couple of properties that duplicate format constraint templates.

Could we use capture groups and see which does (not) capture?

I don’t see how – all we get back from the query service is “matches” or “doesn’t match”, and doing anything not on the query service opens us up to DoS attacks via expensive regexes.

We could have multiple format constraints in cases where it would be useful to check multiple things separately (e.g. one constraint saying it should start with http(s)://, another saying it should contain the domain "example.com"). Just because we can put it all into one regex doesn't mean we have to.

@Lucas_Werkmeister_WMDE: I looked at the issue together with @Lydia_Pintscher

  • If we assume that people make copypaste mistakes or so, just telling boolean-ish "OK"/"Look again" is fine.
  • We discovered that some regex have the qualifier (?) "Syntax Clarification" which looks like a user friendly way of showing what is needed.

Yeah, let’s use messages like this:

The value for local dialing code (123/456) should match “string combining digits, spaces, - (All else excluded, such as: ,/;()+ )” (regex: [\d\- ]+).

or, if there’s no syntax clarification:

The value for local dialing code (123/456) should match the regex [\d\- ]+.

(This includes a change from “pattern” to “regex”.)

As a qualifier would make more sense, since it's clarifying a specific statement.

Yes, there can be multiple format constraints (see e. g. ISBN-10).

Change 370475 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add support for parsing syntax clarification param

https://gerrit.wikimedia.org/r/370475

Change 370476 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Include syntax clarification in Format violation message

https://gerrit.wikimedia.org/r/370476

Change 370475 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add support for parsing syntax clarification param

https://gerrit.wikimedia.org/r/370475

Change 370476 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Include syntax clarification in Format violation message

https://gerrit.wikimedia.org/r/370476

Lucas_Werkmeister_WMDE reopened this task as Open.
Lucas_Werkmeister_WMDE claimed this task.

Wait, UX should probably approve before closing this. I’ll update wikidata-constraints so @Jan_Dittrich can try it out.

Okay, wikidata-constraints now has a shim SPARQL service that just supports REGEX (we don’t have enough resources to run Blazegraph 😄), and there are two test statements on John F. Kennedy (see explanation in the fourth paragraph of the Main Page). @Jan_Dittrich can you take a look and see if the error message is more understandable now?

@Lucas_Werkmeister_WMDE : Yes this is better. I assume that the message is useful for most of our users now.