Page MenuHomePhabricator

CSS validator survey and design work
Closed, ResolvedPublic

Description

A CSS validator and preprocessor is needed for this RFC:

https://www.mediawiki.org/wiki/Requests_for_comment/Allow_styling_in_templates

I volunteered to work on it. I am currently evaluating existing libraries in PHP and otherwise. Whether the job is to integrate, port or develop from scratch will depend on what I find.

Details

Commits
Unknown Object (Diffusion Commit)

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedNone
DuplicateNone
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
ResolvedJdlrobson
DuplicateNone
DuplicateNone
OpenNone
OpenNone
StalledNone
InvalidNone
OpenNone
OpenNone
ResolvedTheDJ
ResolvedTheDJ
InvalidNone
OpenNone
ResolvedTheDJ
OpenNone
ResolvedJdlrobson
OpenJdlrobson
OpenJdlrobson
ResolvedTgr
Resolvedcoren
ResolvedAnomie

Event Timeline

tstarling updated the task description. (Show Details)
tstarling raised the priority of this task from to Normal.
tstarling claimed this task.
tstarling added a project: MediaWiki-Core-Team.
tstarling moved this task to In Dev/Progress on the MediaWiki-Core-Team board.
tstarling changed Security from none to None.
tstarling added a subscriber: tstarling.

Sabberworm's CSS parser was code reviewed.

  • The main problem with it is laxity, for example the use of consumeUntil(). This is not really a problem for security since you can validate the document tree before you reserialize it, but it means that text in lots of invalid places will just go missing instead of being reported to the user as an error.
  • It looks fairly slow -- character set independence is implemented by iterating through the input string character by character, with help from mb_substr(), see Parser::consume().
  • Error messages are mostly unhelpful -- no positions for highlighting etc. UnexpectedTokenException is potentially localisable but some exceptions are just plain \Exception thrown with an English message.

W3C CSS Validator

  • Java
  • Enormous (100 kloc), knows everything about CSS
  • Localised in ~24 languages
  • Has web API: http://jigsaw.w3.org/css-validator/api.html
  • Security validation could probably be done as a profile, along the lines of src/main/resources/org/w3c/css/properties/CSS21Properties.properties
Joe added a subscriber: Joe.Nov 3 2014, 10:58 PM
MaxSem added a subscriber: MaxSem.Jan 7 2015, 9:48 PM
epriestley closed this task as Resolved by committing Unknown Object (Diffusion Commit).Mar 4 2015, 8:17 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
Aklapper reopened this task as Open.Mar 4 2015, 10:19 PM
Aklapper added a subscriber: Aklapper.

I reviewed HTML Purifier, which uses CSSTidy for its CSS tokenizer.

  • CSSTidy's style is quite dated, it seems to be written for PHP 4. It has hardly been touched since 2008. Braces for string offsets, error suppression (@) everywhere, single $csstidy global variable for configuration, assumes allow_url_fopen is enabled, underscore prefixes instead of private/protected.
  • Calls setlocale() unconditionally, which IIRC breaks all multithreaded programs including HHVM.
  • CSSTidy's "parser" is actually a tokenizer rather than a parser. It constructs a token stream rather than an object model.
  • HTMLPurifier does the security validation work, mostly in the source directory AttrDef/CSS.
  • validate() methods return a filtered version of their input stream, with no interface for error reporting. For HTML attributes, a vague error can be reported when the validate() output does not match the input, but this does not seem to be implemented for <style> elements. Insecure or invalid input is silently stripped.
  • The CSSTidy tokenizer is very lax: it will either ignore or pass through various kinds of invalid input. For example, an unclosed brace will apparently not result in an error. In fact, the tokenizer never throws an exception or returns early, it always runs to the end of the string. It reports a handful of error types via csstidy::log(), although these messages are not passed on by HTMLPurifier.
  • CSSTidy appears to suffer from comment parsing issues that led to security vulnerabilities in MediaWiki.

Note that, at least in the prototype for T483: RfC: Allow styling in templates in https://gerrit.wikimedia.org/r/#/c/282155/4/CSSParser.php, @coren wrote a manual RegEx instead of using a library validator/processor.

coren added a comment.Apr 7 2016, 10:35 PM

Note that, at least in the prototype for T483: RfC: Allow styling in templates in https://gerrit.wikimedia.org/r/#/c/282155/4/CSSParser.php, @coren wrote a manual RegEx instead of using a library validator/processor.

To be more precise, there is a regex tokenizer to provide a lexing pass, and the CSS syntax tree is then built by consuming the resulting tokens.

The syntax supported is actually a strict subset of that which is supported by the standard - by design, since the actual forward-compatible syntax is extremely permissive.

The resulting tree can be inspected and, if desired, filtered.

tstarling closed this task as Resolved.Jun 5 2017, 6:02 AM
tstarling reassigned this task from tstarling to Anomie.

Brad wrote a new library for this.

ggellerman moved this task from Up next to Done on the TemplateStyles board.Nov 21 2017, 7:16 PM