Page MenuHomePhabricator

Checkup on cssjanus PHP 7 compat
Closed, ResolvedPublic

Description

https://github.com/cssjanus/php-cssjanus/issues/14 "Transformation breaks on PHP 7.x"

Yeah, the workaround that initially seemed to work didn't do the trick in the end. We fixed our problem by forking cssjanus and applying the patch in #13.

https://github.com/cssjanus/php-cssjanus/pull/13

Event Timeline

Using the test case from https://github.com/cssjanus/php-cssjanus/pull/13#issuecomment-355574605 and poking at it a bit, I confirm https://github.com/cssjanus/php-cssjanus/issues/14#issue-278434814 using php 7.0.13 and 7.2.11 with PCRE 8.39 on Debian (sid): it's going wrong trying to do

$css = preg_replace(self::$patterns['left'], self::$patterns['tmpToken'], $css);

where self::$patterns['left'] is

/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^

0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^

?-9a-f]))|
|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i

It'd be slightly more readable if the \r, \n, and \f were consistently expressed as escapes rather than literals. Doing that on a few lines (95 and 103) gives us

/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^\r\n\f0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^\r\n\f0-9a-f]))|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i

Next I see weird bits (?:\[0-9a-f]{1,6}) and \[^\r\n\f0-9a-f], looks like the backslash wasn't escaped enough so the pattern will incorrectly be trying to match a literal [ followed by some garbage rather than the intended character classes. Turn the \\ on lines 38 and 95 to \\\\ or \\x5c.

/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i

Unfortunately, even after doing all that, the test case still fails.


Digging a little deeper, it turns out the bit that's making it fail is (?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))*?. That 300k line at the end of the test CSS file is apparently too much for it to deal with, with that pattern fragment as written.

This fragment comes from {$patterns['url_chars']}? on line 104, BTW, inside the definition of $patterns['lookahead_not_closing_paren'].

Looking at that fragment, it divides into several parts:

  • [!#$%&*-~]: Any ASCII character other than C0 controls, ', ", (, and ).
  • [\200-\377]: Any non-ASCII byte.
  • (?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?): Allows whitespace, but only after an escape.
  • \\[^\r\n\f0-9a-f]: Allows certain escaped whitespace.

The upshot: it matches some ASCII whitespace, and any non-whitespace bytes except C0 controls, ', ", (, and ).

Immediately after that problem fragment in $patterns['lookahead_not_closing_paren'] we have ['\"]?\s*\), which must match a ) (and allows ' or " and whitespace before it). There's no way that our problem fragment can prevent that from matching when matched greedily but allow it to match after backtracking, as our problem fragment can't match the ) and the rest is optional and so is not relevant. Thus having the problem fragment use *? as a quantifier is pointless extra work, and in fact we can as well go to the opposite extreme and make it *+.

To be clear, that's just changing the ? to a + on line 104:

-               $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))";
+               $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))";

and doing that does indeed make the test case work!

We might want to do the same thing on lines 105 (same analysis) and 103 (similar analysis but with {, NB the '[^']*' bit could eat a { but backtracking will never expose it because nothing else can match an unpaired ' either).


The full set of changes I discussed here, including unnecessary fixes I tried at first, the actual fix to line 104, and the similar changes to lines 103 and 105, would be

diff --git a/src/CSSJanus.php b/src/CSSJanus.php
index 8c494eb..8a49603 100644
--- a/src/CSSJanus.php
+++ b/src/CSSJanus.php
@@ -35,7 +35,7 @@ class CSSJanus {
        private static $patterns = array(
                'tmpToken' => '`TMP`',
                'nonAscii' => '[\200-\377]',
-               'unicode' => '(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)',
+               'unicode' => '(?:(?:\\\\[0-9a-f]{1,6})(?:\r\n|\s)?)',
                'num' => '(?:[0-9]*\.[0-9]+|[0-9]+)',
                'unit' => '(?:em|ex|px|cm|mm|in|pt|pc|deg|rad|grad|ms|s|hz|khz|%)',
                'body_selector' => 'body\s*{\s*',
@@ -92,7 +92,7 @@ class CSSJanus {
 
                // @codingStandardsIgnoreStart Generic.Files.LineLength.TooLong
                $patterns =& self::$patterns;
-               $patterns['escape'] = "(?:{$patterns['unicode']}|\\[^\r\n\f0-9a-f])";
+               $patterns['escape'] = "(?:{$patterns['unicode']}|\\\\[^\\r\\n\\f0-9a-f])";
                $patterns['nmstart'] = "(?:[_a-z]|{$patterns['nonAscii']}|{$patterns['escape']})";
                $patterns['nmchar'] = "(?:[_a-z0-9-]|{$patterns['nonAscii']}|{$patterns['escape']})";
                $patterns['ident'] = "-?{$patterns['nmstart']}{$patterns['nmchar']}*";
@@ -100,9 +100,9 @@ class CSSJanus {
                $patterns['possibly_negative_quantity'] = "((?:-?{$patterns['quantity']})|(?:inherit|auto))";
                $patterns['color'] = "(#?{$patterns['nmchar']}+|(?:rgba?|hsla?)\([ \d.,%-]+\))";
                $patterns['url_chars'] = "(?:{$patterns['url_special_chars']}|{$patterns['nonAscii']}|{$patterns['escape']})*";
-               $patterns['lookahead_not_open_brace'] = "(?!({$patterns['nmchar']}|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)";
-               $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))";
-               $patterns['lookahead_for_closing_paren'] = "(?={$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))";
+               $patterns['lookahead_not_open_brace'] = "(?!({$patterns['nmchar']}|\\r?\\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*+{)";
+               $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))";
+               $patterns['lookahead_for_closing_paren'] = "(?={$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))";
                $patterns['noflip_single'] = "/({$patterns['noflip_annotation']}{$patterns['lookahead_not_open_brace']}[^;}]+;?)/i";
                $patterns['noflip_class'] = "/({$patterns['noflip_annotation']}{$patterns['chars_within_selector']}})/i";
                $patterns['direction_ltr'] = "/({$patterns['direction']})ltr/i";

I don't much like github, and that's where development of cssjanus seems to happen, so I'll hope someone else sends that upstream.

I don't much like github, and that's where development of cssjanus seems to happen, so I'll hope someone else sends that upstream.

We have a few people with write access, so hopefully it shouldn't be too much of a problem

Thanks for looking into it!

Krinkle triaged this task as High priority.May 7 2019, 3:50 PM

Change 509547 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/vendor@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509547

Change 509548 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509548

Change 509547 merged by jenkins-bot:
[mediawiki/vendor@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509547

Change 509548 merged by jenkins-bot:
[mediawiki/core@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509548

Reedy assigned this task to Krinkle.
Reedy removed a project: Patch-For-Review.

Should we backport this to wmf.4? This is blocking further deployment of PHP7.

Should we backport this to wmf.4? This is blocking further deployment of PHP7.

Upto you, fairly trivial to do so if you don't want to wait till .5 is out everywhere by the end of the week. I can do it this afternoon

Change 509792 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/vendor@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509792

Change 509793 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509793

Change 509792 merged by jenkins-bot:
[mediawiki/vendor@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509792

Change 509793 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0

https://gerrit.wikimedia.org/r/509793

Mentioned in SAL (#wikimedia-operations) [2019-05-13T11:58:22Z] <reedy@deploy1001> Synchronized php-1.34.0-wmf.4/vendor/: T215746 (duration: 01m 30s)

Mentioned in SAL (#wikimedia-operations) [2019-05-13T11:59:33Z] <reedy@deploy1001> Synchronized php-1.34.0-wmf.4/composer.json: T215746 (duration: 00m 49s)