Page MenuHomePhabricator

No way to escape | (pipe) character in multivalued action API arguments
Closed, ResolvedPublic

Description

For string arguments with PARAM_ISMULTI set, | is used as a separator, but there is no way to escape it when one of the values need to contain the | character.

Event Timeline

Would be useful to record an example of when a parameter needs to include | , as I am sure you have a practical example in mind right now, and may forget it in a few weeks.

Using the allmessages API to parse registration error messages. TitleBlacklist includes the matching line from the blacklist as a message parameter, and the blacklist tends to include pipes (partly because it contains regexps, partly because of the attribute syntax).

This is also hacked around in action=options by having a separate optionname/optionvalue parameters for values that need to contain pipe characters.

Mostly escaping isn't needed, but it special cases like this it can be problematic. The hard part is retrofitting any sort of escaping into the existing system without interfering with anything that's not aware of the escaping mechanism; perhaps we could use U+001B, and/or only apply it where likely to be needed with a new PARAM_* key.

WebRequest::getVal() (via UtfNormal\Validator::cleanUp()) replaces all C0 control characters with �, which means that we can't easily use U+001B for escaping. On the other hand, if we work around WebRequest's replacing we can use something like U+001F (Unit Separator) to replace '|' without having to worry about escaping the escape character since no valid value can possibly contain that character.

Change 305126 had a related patch set uploaded (by Anomie):
API: Use U 001F (Unit Separator) for separating multi-valued parameters

https://gerrit.wikimedia.org/r/305126

A maybe less hacky solution but a more aggressive change would be to introduce an alternative input format. Just like format=json and format=xml we could allow inputformat=query (the default) and inputformat=json which would be like api.php?inputformat=json&json={"action":"query","meta":"allmessages","amargs":["foo","bar|baz"]}. (Or maybe an alternative endpoint like api-json.php?{"action":"query","meta":"allmessages","amargs":["foo","bar|baz"]} to avoid problems from mixing the two formats.)

That also sounds a lot saner than the current patch.

That sounds a lot more insane to me, as much as some people love to sling json blobs around everywhere. You'd either need to be using a very limited json format or rethink how auto-documentation and parameter handling work.

(I also proposed allowing &foo[]=1&foo[]=2 as an alternative to &foo=1|2 on the Gerrit changeset, which was apparently previously requested as T12262 and which apparently no one liked, but I'm mentioning it here for completeness.)

Change 305126 merged by jenkins-bot:
API: Use U 001F (Unit Separator) for separating multi-valued parameters

https://gerrit.wikimedia.org/r/305126