Page MenuHomePhabricator

Security review of TemplateStyles
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

Extension to allow per-template styling in Mediawiki. Result of the discussion at T483.

Description of how the tool will be used at WMF

The intent is to deploy the extension to production wikis.

Dependencies

No dependencies beyond core >= 1.25.0

Has this project been reviewed before?

Not beyond the code review at project creation.

Working test environment

http://ts.wmflabs.org/w/index.php/Main_Page has a test wiki where the extension is deployed and tests have been conducted. A test environment requires nothing but a 1.25+ MW install and TemplateStyles loaded with wfLoadExtension().

Post-deployment

I'll remain around to maintain the extension for the foreseeable future, but I expect there might be a desire to eventually fold this functionality into core.

Notes

In addition to the usual security consideration, this extension extends the right to affect style sheets to all editors, by design, so there are a number of points we want to make certain of:

  • UI elements should not be style-able by this method;
  • no injection of non-CSS elements can be made to the rendered page;
  • there exists no way for user-provided styles to run javascript (some CSS properties and values are known to be able to do this in several older browsers); and
  • there exists no way for user-provided styles to cause the browser to fetch from an external resource.

Additionally, we will want to create a policy in terms of a blacklist and whitelist combination which will implement the security requirements we have.

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedNone
DuplicateNone
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
Resolved Jdlrobson
DuplicateNone
DuplicateNone
OpenNone
OpenNone
StalledNone
InvalidNone
OpenNone
OpenNone
ResolvedTheDJ
ResolvedTheDJ
InvalidNone
OpenNone
ResolvedTheDJ
OpenNone
Resolved Jdlrobson
Open Jdlrobson
Open Jdlrobson
ResolvedTgr
ResolvedAnomie
Resolvedtstarling
Resolvedcoren

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 22 2016, 5:43 PM
coren added a subscriber: coren.Apr 23 2016, 4:53 PM

I think the extension is at a point where that review makes sense.

In addition to the usual security consideration, this extension extends the right to affect style sheets to all editors, by design, so there are a number of points we want to make certain of:

  • UI elements should not be stylable by this method;
  • no injection of non-CSS elements can be made to the rendered page;
  • there exists no way for user-provided styles to run javascript (some CSS properties and values are known to be able to do this in several older browsers); and
  • there exists no way for user-provided styles to cause the browser to fetch from an external resource.
Jdforrester-WMF changed the task status from Stalled to Open.Apr 25 2016, 5:44 PM
Jdforrester-WMF triaged this task as Normal priority.
Jdforrester-WMF updated the task description. (Show Details)
coren added a comment.May 6 2016, 12:06 AM

@Jdforrester-WMF: What is next for this? Anything I can do to help things along?

csteipp added a subscriber: csteipp.May 9 2016, 5:49 PM

@Jdforrester-WMF: What is next for this? Anything I can do to help things along?

@coren, can you update this Task with all the information from https://www.mediawiki.org/wiki/Wikimedia_Security_Team/Security_reviews#Requesting_a_review? Our schedule is pretty full, but if I can get an idea of the the effort, then I can figure out when we can work it in.

coren updated the task description. (Show Details)May 10 2016, 3:54 PM
coren added a comment.May 10 2016, 3:56 PM

@csteipp, template description updated - I think that's most/all of what you need? I don't expect the work involved to be extreme, the extension is pretty well self-contained and has few interactions with core.

coren updated the task description. (Show Details)May 10 2016, 3:58 PM

Cool. At .5 kloc of php, should be a quick review.

I've got it on my calendar for the week of May 30th.

dpatrick added a subscriber: JKatzWMF.
Tgr added a subscriber: Tgr.Jun 4 2016, 9:33 AM

So Security review.

First off, I'm glad to see that there are unit tests for the css parser.

Major issues

  • T133147 must be fixed before this extension is deployed.
  • Character escape sequences mess up tokenization:
    • Use newlines to break out of strings <templatestyles>#foo {color:"red\0a}\0a\0a#foo{background-image:url( http://example.com );color:red}\0ad{</templatestyles>
    • Getting around tokenization borders with escape sequences: #foo { background-image: url\28http://example.com\29 }
    • escape sequence expansion not recursive. Can get around blacklist: #foo {-o-\5c 6cink: 'javascript:alert(1)'}
  • Does not properly normalize newlines correctly (i.e. \f and \r -> \n). Say you have a lua module makeCharacter that simply returns "\r" (A single carriage return character). {{#tag:templatestyles|#foo {color: "red{{#invoke:makeCharacter|CR}};background-image:url(http://evil.com/d);}"} }} will tokenize differently in the extension then what browsers do.
  • expression, filter, accelerator, -o-link-source, -o-replace also should be on property blacklist (if we'll also adding bad funcs there, image, image-set as well)
  • Would also need to check that this isn't affected by T119158

Minor issues

  • It doesn't really matter, but personally I'd prefer to see new code use JSON for serialization, not php's native serialization, since JSON doesn't have weird features in it that can trigger code to run.
  • Not sure if we care about IE6, but IE6 has some bugs (I'm not sure at what point they were fixed) like full-width U being treated the same as normal ascii U, and comments being tokenized weirdly. Not sure if we still need to work around that given this is 2016
  • I suggest that non-ascii characters be banned in function names (Seems like a sane precaution)

Other (non-security) comments

  • Is the whitelist/blacklist really something that should be a config option? Or should it be hardcoded in the extension?
  • I'm concerned about the potential for race conditions in this extension. See T136054
  • mw-content-ltr/rtl might be a better class to use instead of mw-content-text, particularly on previews, but I'm not sure how consistently that class is used.
  • In terms of not confusing users, maybe its better to show the final post-processed style sheet on the page instead of the original text of the sheet.
  • It would be cool if it gave warnings about banned CSS .e.g. via $parser->getOutput()->addWarning()
  • Doesn't handle multiple templatestyles on a page properly (All but last one ignored. I would expect them to be merged).
  • Behaviour in regards to certain escape sequences for control characters and quote marks seem wrong (e.g. \22 becomes &#x22; where i would expect it to stay \22. Entities are ignored inside style blocks AFAIK, so this becomes invalid syntax)
  • Unicode escape sequences using uppercase hex numbers are interpreted incorrectly. \6C becomes <ACK> followed by C. instead of l.
  • Well probably not a requirement, I would certainly feel much more safe about this if it used whitelists instead of blacklists.
coren claimed this task.Jun 8 2016, 12:49 AM
coren added a subscriber: Bawolff.

I'll be working on changesets for the points above shortly.

Some notes:

  • It's probably impractical to whitelist properties; there are very many of them an they are highly mutable as browsers bring in new features. At the very least, it's a serious maintenance burden. That said: I'm pretty sure that the browser devs now realize that things like javascript in stylesheets is Bad - none of the security nightmare properties are in recent browser that I can see.
  • Functions, otoh, are usually bad - this is why there is only rgb() on the default whitelist.

While I haven't actually looked into the extension, I assume that it works by simply prefixing all given selectors with #mw-content-text. This conflicts with the point "UI elements should not be style-able by this method" because #mw-content-text can indeed contain UI elements. This is T37247.

Example 1:

The following code might (if my assumptions are right) be used to hide the diff view:

<templatestyles>
table.diff {
    display: none;
}
</templatestyles>

Example 2:

The following code might (if my assumptions are right) be used to break the preview mode (showing something different in preview mode than in the saved page):

<templatestyles>
#wikiPreview .hiddenInPreview {
    display: none;
}
</templatestyles>

T137900#2384210 suggests that the ability to break the preview mode is unwanted.

@entlink You are correct in your assesment (other possibilities might be patrol links). Well this is clearly unideal, I think it may be acceptable as its a very limitted amount of interface.

Wonder if we could do something like :not([data-mw] *) at the end of each selector. No idea if there are gotchas with that or performancy concerns (or if it even works, didnt test, and im not sure the definition of "simple selector").

Wonder if we could do something like :not([data-mw] *) at the end of each selector. No idea if there are gotchas with that or performancy concerns (or if it even works, didnt test, and im not sure the definition of "simple selector").

Answer appears to be no. You could prevent selected the element with [data-mw=interface] on it, but you can't stop someone from selecting its children.

dr0ptp4kt moved this task from Backlog to Next Up on the Reading-Admin board.
Tgr added a subscriber: Anomie.Nov 30 2016, 10:10 PM

Note: remediation slated for Q3 FY 2016-2017 (January - March 2017) by Reading Infrastructure. Heads up @dpatrick, in case re-reviews need slotting the same way new products do.

Tgr added a comment.Dec 13 2016, 7:41 AM

**/ is not recognized as the end of a comment; the only reason that's not exploitable is that the double star triggers catastrophic backtracking and preg_match errors out.
In general, I would sleep better if the code used the CSS spec's tokenizing rules instead of coming up with its own.

Selector / media query parsing is way way way too lax. @media { x } @font-face {}} will parse to @media {#mw-content-text x } @font-face {} }, a valid font-face rule (not exploitable for anything ugly since urls are banned, but still).

Sibling selectors can break out of the content area:

~ #catlinks {
    display: none;
}

Closing braces are double-counted - @media screen { .foo{} .bar{} } will parse to #mw-content-text .bar{} @media screen {#mw-content-text .foo{} }. Not a security issue (in fact it prevents worse exploits of the selector validation problem) but completely breaks @media functionality.

Tgr added a comment.Dec 13 2016, 7:54 AM

Also non-security: styles which are not wrapped into @media get hoisted above those which are. While it's hard to imagine someone overriding an @media style with a non-@media one, it would still be nicer to keep the original order.

Nirmos added a subscriber: Nirmos.Mar 14 2017, 7:26 PM

I think it may be acceptable

No, non-admins being able to hide diffs is not acceptable. That would be a nightmare for patrollers. T138420 needs to be solved before this is even considered.

Anomie added a comment.Apr 4 2017, 7:53 PM

We're getting reasonably close to being ready for a new security review, of both the new css-sanitizer library and a TemplateStyles rewritten to use it. We're not quite there yet, but Gergő and I seem to be at an impasse. @Bawolff, we could use your input.

In https://gerrit.wikimedia.org/r/#/c/339303/ and https://gerrit.wikimedia.org/r/#/c/339304/, I've already written the css-sanitizer code to use the grammar definitions from the CSS specs to validate the values of basically everything. IMO this will give us the best chance to not allow anything through that shouldn't be let through, since everything is validated. It also means that if we change code or configuration so a formerly-allowed item is banned, it'll invalidate just the item in question rather than rejecting the whole sheet.

Gergő is concerned that the pattern-matching may need to backtrack and thinks actually validating everything is overkill. His proposal is to instead do the following:

  • Write custom code to parse selectors.
    • Whether we'd do the same for @media queries, @supports conditions, and so on versus just allowing any vaguely-matching value hasn't been specified.
  • Divide at-rules into "banned", "allowed no matter what the contents", and "validate the contents as a list of rules".
    • @page would be in that middle category, because it doesn't now contain anything potentially dangerous as long as functions and url-like strings are handled globally. If something eventually does add something dangerous, we'd have to revisit it.
  • Somehow detect "url-like strings" and apply a whitelist to them in a context-free manner at the token level.
    • Thus the entire stylesheet would have to be rejected if it contains any "url-like string" that isn't whitelisted.
    • How exactly to tell if a string us "url-like" hasn't been specified.
    • Presumably this would also apply to URL tokens.
  • Whitelist CSS functions in a context-free manner at the token level.
    • Thus the entire stylesheet would have to be rejected if it contains any non-whitelisted function.
    • I note this would mean we could never implement attr() since whether a string attr() is safe or not depends on the context.
    • We'd have to hope the specs don't give us more situations where a function is safe in one context and unsafe in another.
  • Presumably we'd also need to reject <bad-string> and <bad-url> tokens at the token level.
    • Thus the entire stylesheet would have to be rejected if it contains either of these.
  • Whitelist CSS property declarations by name but allow them to have any content whatsoever, and rely on the token-level whitelisting to have caught all problematic values.

Change 346808 had a related patch set uploaded (by Anomie):
[mediawiki/extensions/TemplateStyles@master] Use wikimedia/css-sanitizer, and rewrite the hooking

https://gerrit.wikimedia.org/r/346808

Anomie removed coren as the assignee of this task.Apr 13 2017, 4:26 PM

We're getting reasonably close to being ready for a new security review, of both the new css-sanitizer library and a TemplateStyles rewritten to use it. We're not quite there yet, but Gergő and I seem to be at an impasse.

For the record, Tim resolved the impasse. Thanks!

And in case it's not already obvious to anyone watching that the new review has been scheduled, we're ready for a new security review based on Gerrit change 346808 and wikimedia/css-sanitizer.

I'm still working through all this, but some initial things:

  • As a precaution, I think there should be a post processing step to kill all U+007F characters, in case someone manages to insert a strip marker. I don't think its possible to insert a strip marker in a valid way into a stylesheet, but just in case, it doesn't hurt to replace U+007F with a unicode replacement character or css \7f escape or something.
  • Handling of \C and \D escape codes in serialization of strings is unsafe due to interaction with pre-processing step: For example, a user could evade validation by doing: div#bar3 { background-image: url( "https://upload.wikimedia.org/wikipedia/commons/foo\D d)}\D* {background:red url(http://example.com) !important}\D bg.jpg" ); } (And ditto with \C instead of \D)
  • Some webservers will resolve paths with %2F as if they were unescaped /'s. Upload.wikimedia.org seems to not do that (Although for example, https://en.wikipedia.org/wiki/foo%2f..%2f..%2fw/api.php does exhibit this behaviour). As a precaution, %2F's should get converted to a / before doing the check to see if the url matches the whitelist regex.

Change 351892 had a related patch set uploaded (by Anomie; owner: Anomie):
[css-sanitizer@master] Fix escaping of various characters

https://gerrit.wikimedia.org/r/351892

Anomie added a comment.May 4 2017, 5:41 PM

I'm still working through all this, but some initial things:

  • As a precaution, I think there should be a post processing step to kill all U+007F characters, in case someone manages to insert a strip marker. I don't think its possible to insert a strip marker in a valid way into a stylesheet, but just in case, it doesn't hurt to replace U+007F with a unicode replacement character or css \7f escape or something.

You could embed it inside a CSS string. Elsewhere it would probably parse as a <delim-token>, but the sanitizer should filter those out (which is good because there's no way to legally escape them). Or a comment, but the sanitizer will always filter those out.

I think I'll have css-sanitizer use hex escapes for all non-graphic characters except U+0020 in strings and names, and I'll have TemplateStyles replace U+007F with U+FFFD if any sneak through the sanitizer.

  • Handling of \C and \D escape codes in serialization of strings is unsafe due to interaction with pre-processing step: For example, a user could evade validation by doing: div#bar3 { background-image: url( "https://upload.wikimedia.org/wikipedia/commons/foo\D d)}\D* {background:red url(http://example.com) !important}\D bg.jpg" ); } (And ditto with \C instead of \D)

Good catch, when the file is read back in by a CSS parser it'll convert the \r or \f into a \n.

  • Some webservers will resolve paths with %2F as if they were unescaped /'s. Upload.wikimedia.org seems to not do that (Although for example, https://en.wikipedia.org/wiki/foo%2f..%2f..%2fw/api.php does exhibit this behaviour). As a precaution, %2F's should get converted to a / before doing the check to see if the url matches the whitelist regex.

Not for the whitelist, we don't want to allow "/foo%2Fbar" when it's supposed to be restricted to "/foo/bar" on servers that don't unescape. But yes for the ".." check.

I'm still working through all this, but some initial things:

  • As a precaution, I think there should be a post processing step to kill all U+007F characters, in case someone manages to insert a strip marker. I don't think its possible to insert a strip marker in a valid way into a stylesheet, but just in case, it doesn't hurt to replace U+007F with a unicode replacement character or css \7f escape or something.

You could embed it inside a CSS string. Elsewhere it would probably parse as a <delim-token>, but the sanitizer should filter those out (which is good because there's no way to legally escape them). Or a comment, but the sanitizer will always filter those out.

I think I'll have css-sanitizer use hex escapes for all non-graphic characters except U+0020 in strings and names, and I'll have TemplateStyles replace U+007F with U+FFFD if any sneak through the sanitizer.

CSS strings wont let the quote character be unescaped. Maybe in @supports if one disables strict mode checking

Just as an update, this is almost done and will be finished by the end of the week

Nirmos added a comment.Jun 2 2017, 5:36 AM

Has TemplateStyles been updated to use .mw-parser-output instead of #mw-content-text?

Nirmos added a comment.Jun 2 2017, 5:32 PM

Wonderful! Thank you! Then I have nothing to worry about.

Sorry this took so long. I think this is good to go.

My biggest worry is lack of registering image links (This is not a security issue, so this particular paragraph should be taken as a "volunteer hat" comment). This might cause friction between the commons and enwiki communities, which already have a somewhat strained relationship. Not recording what pages use what images could also potentially help vandals move more stealthily although that's a bit of a stretch. I would really like to see images used by this extension be recorded in the globalimagelinks table.

This allows users to include svgs directly on pages (including non-scripted animated svgs). This is a rather major change to how we previously have handled svgs. I fully believe this is safe (Browsers disable scripts on svgs included via css, and we also filter svgs), but thought I should point that out specifically just because its new to our sites that many people might find interesting.

I know this is kind of a silly thing to mention, and I only do because it is hard to change after the fact, but I wonder if its been considered to use something like {{INCLUDESTYLE:template:Foo/bar.css}} instead of the html style tag. Other things in MediaWiki that modify page rendering (instead of just embedding a non-wikitext code) such as {{DISPLAYTITLE:...}} seem to prefer the parser function form, so it might be more consistent. This is a total bikeshed thing though, so feel free to ignore :)

Potential privacy/side channel leaks (relatively minor):

These are fairly minor. I think its important to document the risk, but in my opinion they are probably acceptable risks.

The basic idea would be to have an external resource that's only loaded on some condition. The attacker can determine if the victim meets that condition by seeing if the external resource is fetched. Even if all external resources are Wikimedia controlled, the attacker can still potentially extract data based on if the resource is in varnish cache or not. Doing this in practise using template styles is probably pretty hard because the css files can't be dynamically generated (unlike wikitext), attr() is disabled outside of content:, selectors are limitted to the parser output and not the general page content. The potential leaks that I see are as follows:

  • Signalling if something is present in the page - If you have some template {{foo}} that outputs some value that's sensitive (perhaps {{REVISIONUSER}} on CSRF based preview), and the domain of the value is small, you could make an element with the id of the value you want to leak, and then have background-image rules for all the potential values you want to leak.
  • @media queries - Could potentially leak properties of the user-agent such as screen resolution, etc
  • @font-face - You could make an @font-face rule that specifies a very small unicode-range (say 1 letter) and a non-local src. You then can then determine if the letter in question is used in the selected element by if the font-file is downloaded or not (Note: In firefox, When using a pseudo-element like ::first-line, firefox checks the entire element, not just the pseudo-element for the character in order to download the font, which helps reduce the power of this attack). Note, this attack is not just limitted to the parser output part of the page. Since @font-face rules are global, you can use this to leak if a specific character appears anywhere in the page that is styled to use a non-default/non-generic-css-keyword font (Which part of the page that actually is probably depends on skins and things). For example, if the user had a username with an obscure character in it, this could maybe leak if a specific user viewed a specific page, albeit in default vector the p-personal portlet doesn't have a font applied so that wouldn't work in that skin.

Minor notes:

  • Perhaps the svg whitelist should ensure filenames end in ".svg" as a paranoid precaution against files that are multiple valid types slipping through our svg filter as a non-svg type.
  • It would be kind of cool if we could calculate the effective position of absolutely positioned elements, and ban them from being outside the parser output area (e.g. not allowed to have negative absolute positions relative to the parser output container div). Probably too hard to do that though.
    • On a similar note, i wonder if we should ban position: fixed; since that seems to have limited legit uses and could easily be used to trick users. OTOH, we've already allowed it since forever on inline styles.
  • "TemplateStylesHooks.php" line 242 - Would checking for debug=true here potentially cause parser cache pollution if a debug=true parse got saved in the parser cache?
  • It probably doesn't matter since the value of tokens seems to be very rarely used, but I believe the value of a numeric token will be incorrect if the token has a leading 0 (e.g. 010 - php would call that 8 but CSS would say that it has a "value" of 10.).
  • Does parseComponentValue in step 5 consume a token when it should only consume whitespace? (Not called currently in TemplateStyles, but I can imagine we'd make use of more of this library at some future date where this might matter).
  • The css sanitizer doesn't explicitly filter any uri schemes. I wonder if its worth filtering javascript: directly in the css-sanitizer library. OTOH, its very hard to blacklist a uri scheme, and all modern browsers don't support it in css, and template styles will check the whitelist anyways.
  • Its possible to mess up stuff outside of display area via @font-face. Someone for example could make a font with all blank characters. Even if no font urls are whitelisted, user could replace it with a local font like windings, etc. In the worst case, someone might be able to change the text to say something rather arbitrary using customer ligatures (I don't know enough about fonts to know how practical that is). I think this is an acceptable risk, but it should probably be documented.
    • Doesn't really make sense to whitelist commons in @font-face rules. woff files aren't currently allowed, and svg fonts are supported by very few browsers (And support is going down - chrome removed support). Maybe commons should allow woff files.
  • For the property mask-image: - I found this part of the spec confusing (CSS Masks - 7.1), but from what I understand, it could either reference a generic image, or an svg mask element. In the sanitizer, $maskRef is defined as an alternative between 'none', $matcherFactory->image() or $matcherFactory->url( 'svg' ). As far as I can tell, the last alternative would never be reached and the urls would always be treated as normal images, since image() also looks for $matcherFactory->url( 'image' ). Thus the property is never validated as an svg, but only as anormal image. I don't think that matters too much for our usecase, but this should be documented.
Anomie added a comment.Jun 5 2017, 4:23 PM

My biggest worry is lack of registering image links (This is not a security issue, so this particular paragraph should be taken as a "volunteer hat" comment). This might cause friction between the commons and enwiki communities, which already have a somewhat strained relationship. Not recording what pages use what images could also potentially help vandals move more stealthily although that's a bit of a stretch. I would really like to see images used by this extension be recorded in the globalimagelinks table.

FWIW, this is already the status quo. For example, I see no records for en:MediaWiki:Common.css in the global usage table for Checker-16x16.png, Treeview-grey-line.png, and so on.

It's not that simple since CSS links the images by URL rather than using wikitext syntax. To do that we'd have to somehow backconvert "//upload.wikimedia.org/wikipedia/commons/thumb/7/70/Example.png/16px-Example.png" to determine it comes from commons:File:Example.png, and I'd rather not build such logic into TemplateStyles (especially considering T66214 and related tasks want to change the format of that link).

But feel free to file a separate, non-security task requesting that TemplateStyles support some syntax that turns wiki title references into appropriate URLs. TemplateStyles could define a "-mw-file()" function and preprocess the token stream to convert that into the standard CSS url() function, at the same time registering the links in the ParserOutput.

I know this is kind of a silly thing to mention, and I only do because it is hard to change after the fact, but I wonder if its been considered to use something like {{INCLUDESTYLE:template:Foo/bar.css}} instead of the html style tag. Other things in MediaWiki that modify page rendering (instead of just embedding a non-wikitext code) such as {{DISPLAYTITLE:...}} seem to prefer the parser function form, so it might be more consistent. This is a total bikeshed thing though, so feel free to ignore :)

I'd rather go for a hash-named parser function like {{#includecss:...}}, if we were going to do that. I don't have a strong opinion either way.

  • Signalling if something is present in the page - If you have some template {{foo}} that outputs some value that's sensitive (perhaps {{REVISIONUSER}} on CSRF based preview), and the domain of the value is small, you could make an element with the id of the value you want to leak, and then have background-image rules for all the potential values you want to leak.

OTOH, you could probably do the same thing with wikitext and parser functions or Scribunto by conditionally including an image with an unusual thumbnail size, or a video with a particular start frame, or something like that.

  • @media queries - Could potentially leak properties of the user-agent such as screen resolution, etc

We'll have to live with that one, since being able to use @media queries is one of the major drivers here.

  • @font-face - You could make an @font-face rule that specifies a very small unicode-range (say 1 letter) and a non-local src.

In the configuration used for WMF sites, we'll probably have nothing in $wgTemplateStylesAllowedUrls['font'] anyway. Someday we might provide a limited set of fonts as a static resource somewhere, I suppose.

Actually, I should probably just make that the default. I don't think font files can be uploaded to Commons anyway, except maybe for SVG fonts that most browsers don't support anymore.

  • Perhaps the svg whitelist should ensure filenames end in ".svg" as a paranoid precaution against files that are multiple valid types slipping through our svg filter as a non-svg type.

Easy enough, I think: <^https://upload\\.wikimedia\\.org/wikipedia/commons/[^?#]*\.svg(?:[?#]|$)>

  • It would be kind of cool if we could calculate the effective position of absolutely positioned elements, and ban them from being outside the parser output area (e.g. not allowed to have negative absolute positions relative to the parser output container div). Probably too hard to do that though.

Ha, not a chance. We'd do better to just finally take care of T40848.

  • On a similar note, i wonder if we should ban position: fixed; since that seems to have limited legit uses and could easily be used to trick users. OTOH, we've already allowed it since forever on inline styles.

It would be easy enough to redefine the 'position' property in TemplateStylesHooks::getSanitizer(), but as you note we'd want to do it for inline styles too or it's mostly pointless.

  • "TemplateStylesHooks.php" line 242 - Would checking for debug=true here potentially cause parser cache pollution if a debug=true parse got saved in the parser cache?

There shouldn't be any functional difference in the output, so it would be limited to including or not including some whitespace and semicolons. Even non-minified, css-sanitizer still strips all comments and collapses all whitespace into a single space (although I might fix the latter someday).

  • It probably doesn't matter since the value of tokens seems to be very rarely used, but I believe the value of a numeric token will be incorrect if the token has a leading 0 (e.g. 010 - php would call that 8 but CSS would say that it has a "value" of 10.).

While PHP's numeric literals recognize a leading zero as indicating octal, its conversion of strings to integers doesn't: (int)"010" outputs 10, not 8.

  • Does parseComponentValue in step 5 consume a token when it should only consume whitespace? (Not called currently in TemplateStyles, but I can imagine we'd make use of more of this library at some future date where this might matter).

consumeComponentValue leaves the "current position" pointer pointing at the last token of the component value. So in step 5, we need to consume that last token as well as any following whitespace.

  • The css sanitizer doesn't explicitly filter any uri schemes. I wonder if its worth filtering javascript: directly in the css-sanitizer library. OTOH, its very hard to blacklist a uri scheme, and all modern browsers don't support it in css, and template styles will check the whitelist anyways.

Yes, the CSS sanitizer doesn't try to filter URIs in any way. But it supports plugging in any validation you want, as TemplateStyles does for its whitelist.

  • Its possible to mess up stuff outside of display area via @font-face. Someone for example could make a font with all blank characters. Even if no font urls are whitelisted, user could replace it with a local font like windings, etc. In the worst case, someone might be able to change the text to say something rather arbitrary using customer ligatures (I don't know enough about fonts to know how practical that is). I think this is an acceptable risk, but it should probably be documented.

It's easy enough to avoid in TemplateStyles, just insist that the font-family begin with some particular string (e.g. "TemplateStyles"). Done.

  • Doesn't really make sense to whitelist commons in @font-face rules. woff files aren't currently allowed, and svg fonts are supported by very few browsers (And support is going down - chrome removed support). Maybe commons should allow woff files.

I realized that above. ;) Done.

  • For the property mask-image: - I found this part of the spec confusing (CSS Masks - 7.1), but from what I understand, it could either reference a generic image, or an svg mask element. In the sanitizer, $maskRef is defined as an alternative between 'none', $matcherFactory->image() or $matcherFactory->url( 'svg' ). As far as I can tell, the last alternative would never be reached and the urls would always be treated as normal images, since image() also looks for $matcherFactory->url( 'image' ). Thus the property is never validated as an svg, but only as anormal image. I don't think that matters too much for our usecase, but this should be documented.

The last alternative could be reached if the given url doesn't pass ->url( 'image' ) but does pass ->url( 'svg' ). I can't think of any really sensible way to have ->url( 'image' ) not allow something that should only be a ->url( 'svg' ), since SVGs are images after all. It may well be that browsers can be given an SVG to be used like any other image rather than "A URL reference to a mask element".

Nirmos added a comment.Jun 5 2017, 7:04 PM

T40848

Is this referring to a Phabricator task?

T40848

Is this referring to a Phabricator task?

Yes, its currently a restricted task (although at this point the contents aren't very secret anymore). Its the task about people being able to overlay content on top of other content using absolute positioning and z-index rules.

tstarling closed this task as Resolved.Jun 6 2017, 1:36 AM
tstarling claimed this task.

Appears complete to me.

Change 351892 merged by jenkins-bot:
[css-sanitizer@master] Fix escaping of various characters

https://gerrit.wikimedia.org/r/351892

Change 346808 merged by jenkins-bot:
[mediawiki/extensions/TemplateStyles@master] Use wikimedia/css-sanitizer, and rewrite the hooking

https://gerrit.wikimedia.org/r/346808

ggellerman moved this task from Up next to Done on the TemplateStyles board.Nov 21 2017, 7:16 PM