Page MenuHomePhabricator

Introduce a new "strict-boolean" MediaWiki API parameter type for better validation and semantics?
Open, Needs TriagePublic

Description

I'm writing a new API that takes a boolean parameter "damaging" to flag whether or not a revision is judged to be damaging, and learned something important about the MediaWiki API:
https://www.mediawiki.org/wiki/API:Data_formats#Boolean_values

Boolean parameters work like HTML checkboxes: if the parameter is specified in the HTTP request, regardless of value, it is considered true. For a false value, omit the parameter entirely. The best way to specify a true parameter in an HTTP request is to use someParam=; the trailing = ensures the browser or HTTP library does not discard the "empty" someParam.

This might make sense for APIs which will be driven directly by an HTML form, but I don't think it's a good idea in any other context. I see the checkbox-style parameter passing as problematic for these reasons:

  • No ability to require the parameter, since passing "false" can only be done by omitting the parameter.
  • Passing an empty value for true has bad semantics, and is unclear to the uninitiated.
  • Omitting the parameter has bad semantics, because it's unclear that we're actually setting something to false.

Obviously we can't do anything to change such a deep assumption in the existing API, but my suggestion is that we introduce a new data type "strict-boolean", which behaves like the other data types, so for a required parameter, we have:

  • "damaging=true" is parsed as true.
  • "damaging=false" is parsed as false.
  • "damaging=", "damaging=0", or any other value results in an error.
  • Omitting the "damaging" parameter results in an error.

Event Timeline

awight created this task.Aug 29 2018, 11:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2018, 11:25 PM
Harej added a subscriber: Harej.Aug 29 2018, 11:46 PM

Alternatively we could call the datatype strict-boolean.

Alternatively we could call the datatype strict-boolean.

I like that much better, changing my proposal accordingly.

awight renamed this task from Introduce a new "boolean2" MediaWiki API parameter type for better validation and semantics? to Introduce a new "strict-boolean" MediaWiki API parameter type for better validation and semantics?.Aug 30 2018, 3:05 AM
awight updated the task description. (Show Details)
Anomie added a subscriber: Anomie.Aug 30 2018, 2:41 PM

From the API client perspective, you could do this now by simply declaring it as an enumerated property with values 'true' and 'false'

'damaging' => [
    self::PARAM_TYPE => [ 'true', 'false' ],
    self::PARAM_REQUIRED => true,
],

The drawback is that it'll come into your module from extractRequestParams() or getParameter() as the PHP strings 'true' or 'false' rather than as PHP booleans, so your code would have to convert it for use. Also ApiSandbox would display it as a dropdown with the two options rather than as a checkbox.

"strict-boolean" doesn't strike me as a particularly good name either. "Strict" implies there's something "non-strict" about the other kind of booleans when in fact it's actually something completely different. Ideally we might rename the existing "boolean" to something like "checkbox-boolean" and call the new type "value-boolean", but we'd have to consider the resulting breaking change to the output of action=paraminfo.

It's pretty common to use 0 and 1 for boolean values, perhaps this new type should accept those too.

Change 456417 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/core@master] [WIP] Introduce a new API parameter type for explicit booleans

https://gerrit.wikimedia.org/r/456417

From the API client perspective, you could do this now by simply declaring it as an enumerated property with values 'true' and 'false'

Thanks, this is a helpful workaround!

"strict-boolean" doesn't strike me as a particularly good name either. "Strict" implies there's something "non-strict" about the other kind of booleans when in fact it's actually something completely different. Ideally we might rename the existing "boolean" to something like "checkbox-boolean" and call the new type "value-boolean", but we'd have to consider the resulting breaking change to the output of action=paraminfo.

Maybe "explicit-boolean"?

It's pretty common to use 0 and 1 for boolean values, perhaps this new type should accept those too.

Good thought! The robustness principle suggests that we should allow this, but there are benefits to restricting to explicit "true" and "false", for example it's self-evidently a boolean from inspecting an URL, and we catch certain types of lazy programming mistakes. I think there's a tradeoff here, in short.

"strict-boolean" doesn't strike me as a particularly good name either. "Strict" implies there's something "non-strict" about the other kind of booleans when in fact it's actually something completely different. Ideally we might rename the existing "boolean" to something like "checkbox-boolean" and call the new type "value-boolean", but we'd have to consider the resulting breaking change to the output of action=paraminfo.

Maybe "explicit-boolean"?

What's non-explicit about the other boolean?

"strict-boolean" doesn't strike me as a particularly good name either. "Strict" implies there's something "non-strict" about the other kind of booleans when in fact it's actually something completely different. Ideally we might rename the existing "boolean" to something like "checkbox-boolean" and call the new type "value-boolean", but we'd have to consider the resulting breaking change to the output of action=paraminfo.

Maybe "explicit-boolean"?

What's non-explicit about the other boolean?

It's more about the new type being explicit, "true" and "false" being about as explicit as it gets, rather than being a comparative thing with one parameter type pitted against the other. But I guess the original boolean type is not explicit because you have to know that "name=" with no value means true, rather than having it spelled out. This is also the reason I'm slightly biased against allowing zero or one, because they aren't intrinsically boolean values.

I'm +1 to the general concept of this. I do think it will be confusing in the long run to have two boolean types though, so if this is new boolean type works out in a small set of initial modules, I think we should have some kind of migration plan for moving the existing boolean types over to the newer one.

And I agree with @Anomie about supporting 1/0 as true/false. I've seen plenty of client code that uses 1.

It's more about the new type being explicit, "true" and "false" being about as explicit as it gets, rather than being a comparative thing with one parameter type pitted against the other. But I guess the original boolean type is not explicit because you have to know that "name=" with no value means true, rather than having it spelled out.

The original boolean type is explicit enough, it works on presence or absence of the parameter exactly like HTML <input type="checkbox"/>.

I think we should have some kind of migration plan for moving the existing boolean types over to the newer one.

If we plan to migrate everything over to value-based booleans, perhaps we should skip the step of introducing a new, different type entirely. A more direct migration might look like:

  • Collect the names of all existing boolean parameters.
  • Run some queries against ApiAction in Analytics's data lake to see what values clients are actually supplying.
  • Decide which subset should continue to be supported, and corresponding values for "false".
  • (MW 1.X) Add warnings for any request that uses unapproved values. Possibly two, one for to-be-unrecognized values and one for to-be-false values.
  • Monitor the rate of these warnings.
  • (MW 1.X+1 or later) Convert the warnings to errors.
  • (MW 1.X+2 or later) When usage of approved "false" values becomes near-zero, enable them to actually mean false.

This sounds like a good migration plan, and I personally like the idea of keeping a single boolean type. I defer to others' opinion on how much disruption it would cause.

Another migration track should address the API code, where we might want to make some boolean parameters required once this becomes a possibility.

We should also consider sibling APIs which were designed for consistency with the MW API, for example ORES which takes a "features=" parameter using the HTML "checkbox" style parsed as true if present at all. These should probably be migrated on a similar schedule.

awight removed a subscriber: awight.Mar 21 2019, 4:04 PM

Change 456417 abandoned by Awight:
[WIP] Introduce a new API parameter type for explicit booleans

https://gerrit.wikimedia.org/r/456417