Page MenuHomePhabricator

Use appropriate escaping level in Language::truncateForVisual
Open, Needs TriagePublic

Description

The method will return (part of) its argument, hence preserving the taintedness, but can also append an escaped message for ellipsis. Thus, in order to get escaping right (without double-escaping), callers are supposed to pass an escaped string to truncateForVisual. But HTML might get truncated, thus leaving unwanted characters in the output.

Conversely, we cannot just make truncateForVisual use text() and escape the result after the call, as that could increase the length. This is probably acceptable if we assume that "visual" means HTML, and the given length is just a made up number to make the content fit some part of the page, and not a hard constraint. In that case, escaping the string after the truncateForVisual call will result in the string being displayed with exactly the number of characters counted by strlen, which is supposedly what the caller wants.

Event Timeline

I think other languages functions are also effected. Language::commaList also returns the same taint of the input and added the separator escaped. Which gives unescaped or escaped content. There are some more list functions.

I think other languages functions are also effected. Language::commaList also returns the same taint of the input and added the separator escaped. Which gives unescaped or escaped content. There are some more list functions.

Yes, but for those it's easier to solve, because there are no length issues.

Is there a way to tell taint it should get the taint for the return unchanged from the input parameter $string?

This is also about Language::truncateForDatabase

There are many usages with htmlspecialchars, which now all failing with DoubleEscaped issues - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Newsletter/+/719766

Is there a way to tell taint it should get the taint for the return unchanged from the input parameter $string?

taint-check knows this already, but it also knows that an escaped value is potentially part of the returned expression; hence the return value can be tainted and escaped at the same time. This reflects what the code does, and I'd say it's not a good idea to force taint-check into thinking that it's not true.

Instead, if many (most?) callers are already escaping the return value, the escaped() call in truncateInternal should be changed to text().