https://github.com/cssjanus/php-cssjanus/issues/14 "Transformation breaks on PHP 7.x"
Yeah, the workaround that initially seemed to work didn't do the trick in the end. We fixed our problem by forking cssjanus and applying the patch in #13.
https://github.com/cssjanus/php-cssjanus/issues/14 "Transformation breaks on PHP 7.x"
Yeah, the workaround that initially seemed to work didn't do the trick in the end. We fixed our problem by forking cssjanus and applying the patch in #13.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | jijiki | T219127 SRE FY19-20 Q1 goal: complete the transition to PHP7 | |||
Resolved | None | T176370 Migrate to PHP 7 in WMF production | |||
Resolved | Krinkle | T215746 Checkup on cssjanus PHP 7 compat |
Using the test case from https://github.com/cssjanus/php-cssjanus/pull/13#issuecomment-355574605 and poking at it a bit, I confirm https://github.com/cssjanus/php-cssjanus/issues/14#issue-278434814 using php 7.0.13 and 7.2.11 with PCRE 8.39 on Debian (sid): it's going wrong trying to do
$css = preg_replace(self::$patterns['left'], self::$patterns['tmpToken'], $css);
where self::$patterns['left'] is
/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^ 0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^ ?-9a-f]))| |\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i
It'd be slightly more readable if the \r, \n, and \f were consistently expressed as escapes rather than literals. Doing that on a few lines (95 and 103) gives us
/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^\r\n\f0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\[0-9a-f]{1,6})(?:\r\n|\s)?)|\[^\r\n\f0-9a-f]))|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i
Next I see weird bits (?:\[0-9a-f]{1,6}) and \[^\r\n\f0-9a-f], looks like the backslash wasn't escaped enough so the pattern will incorrectly be trying to match a literal [ followed by some garbage rather than the intended character classes. Turn the \\ on lines 38 and 95 to \\\\ or \\x5c.
/(?<![a-zA-Z])(left)(?![a-zA-Z])(?!(?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))*?['\"]?\s*\))(?!((?:[_a-z0-9-]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)/i
Unfortunately, even after doing all that, the test case still fails.
Digging a little deeper, it turns out the bit that's making it fail is (?:[!#$%&*-~]|[\200-\377]|(?:(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)|\\[^\r\n\f0-9a-f]))*?. That 300k line at the end of the test CSS file is apparently too much for it to deal with, with that pattern fragment as written.
This fragment comes from {$patterns['url_chars']}? on line 104, BTW, inside the definition of $patterns['lookahead_not_closing_paren'].
Looking at that fragment, it divides into several parts:
The upshot: it matches some ASCII whitespace, and any non-whitespace bytes except C0 controls, ', ", (, and ).
Immediately after that problem fragment in $patterns['lookahead_not_closing_paren'] we have ['\"]?\s*\), which must match a ) (and allows ' or " and whitespace before it). There's no way that our problem fragment can prevent that from matching when matched greedily but allow it to match after backtracking, as our problem fragment can't match the ) and the rest is optional and so is not relevant. Thus having the problem fragment use *? as a quantifier is pointless extra work, and in fact we can as well go to the opposite extreme and make it *+.
To be clear, that's just changing the ? to a + on line 104:
- $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))"; + $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))";
and doing that does indeed make the test case work!
We might want to do the same thing on lines 105 (same analysis) and 103 (similar analysis but with {, NB the '[^']*' bit could eat a { but backtracking will never expose it because nothing else can match an unpaired ' either).
The full set of changes I discussed here, including unnecessary fixes I tried at first, the actual fix to line 104, and the similar changes to lines 103 and 105, would be
diff --git a/src/CSSJanus.php b/src/CSSJanus.php index 8c494eb..8a49603 100644 --- a/src/CSSJanus.php +++ b/src/CSSJanus.php @@ -35,7 +35,7 @@ class CSSJanus { private static $patterns = array( 'tmpToken' => '`TMP`', 'nonAscii' => '[\200-\377]', - 'unicode' => '(?:(?:\\[0-9a-f]{1,6})(?:\r\n|\s)?)', + 'unicode' => '(?:(?:\\\\[0-9a-f]{1,6})(?:\r\n|\s)?)', 'num' => '(?:[0-9]*\.[0-9]+|[0-9]+)', 'unit' => '(?:em|ex|px|cm|mm|in|pt|pc|deg|rad|grad|ms|s|hz|khz|%)', 'body_selector' => 'body\s*{\s*', @@ -92,7 +92,7 @@ class CSSJanus { // @codingStandardsIgnoreStart Generic.Files.LineLength.TooLong $patterns =& self::$patterns; - $patterns['escape'] = "(?:{$patterns['unicode']}|\\[^\r\n\f0-9a-f])"; + $patterns['escape'] = "(?:{$patterns['unicode']}|\\\\[^\\r\\n\\f0-9a-f])"; $patterns['nmstart'] = "(?:[_a-z]|{$patterns['nonAscii']}|{$patterns['escape']})"; $patterns['nmchar'] = "(?:[_a-z0-9-]|{$patterns['nonAscii']}|{$patterns['escape']})"; $patterns['ident'] = "-?{$patterns['nmstart']}{$patterns['nmchar']}*"; @@ -100,9 +100,9 @@ class CSSJanus { $patterns['possibly_negative_quantity'] = "((?:-?{$patterns['quantity']})|(?:inherit|auto))"; $patterns['color'] = "(#?{$patterns['nmchar']}+|(?:rgba?|hsla?)\([ \d.,%-]+\))"; $patterns['url_chars'] = "(?:{$patterns['url_special_chars']}|{$patterns['nonAscii']}|{$patterns['escape']})*"; - $patterns['lookahead_not_open_brace'] = "(?!({$patterns['nmchar']}|\r?\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*?{)"; - $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))"; - $patterns['lookahead_for_closing_paren'] = "(?={$patterns['url_chars']}?{$patterns['valid_after_uri_chars']}\))"; + $patterns['lookahead_not_open_brace'] = "(?!({$patterns['nmchar']}|\\r?\\n|\s|#|\:|\.|\,|\+|>|\(|\)|\[|\]|=|\*=|~=|\^=|'[^']*'])*+{)"; + $patterns['lookahead_not_closing_paren'] = "(?!{$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))"; + $patterns['lookahead_for_closing_paren'] = "(?={$patterns['url_chars']}+{$patterns['valid_after_uri_chars']}\))"; $patterns['noflip_single'] = "/({$patterns['noflip_annotation']}{$patterns['lookahead_not_open_brace']}[^;}]+;?)/i"; $patterns['noflip_class'] = "/({$patterns['noflip_annotation']}{$patterns['chars_within_selector']}})/i"; $patterns['direction_ltr'] = "/({$patterns['direction']})ltr/i";
I don't much like github, and that's where development of cssjanus seems to happen, so I'll hope someone else sends that upstream.
We have a few people with write access, so hopefully it shouldn't be too much of a problem
Thanks for looking into it!
Spec updated in upstream canonical cssjanus, https://github.com/cssjanus/cssjanus/blob/v1.3.2/History.md.
Anomie's patch applied to the PHP port and released, https://github.com/cssjanus/php-cssjanus/releases/tag/v1.3.0.
Change 509547 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/vendor@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509548 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509547 merged by jenkins-bot:
[mediawiki/vendor@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509548 merged by jenkins-bot:
[mediawiki/core@master] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Upto you, fairly trivial to do so if you don't want to wait till .5 is out everywhere by the end of the week. I can do it this afternoon
Change 509792 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/vendor@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509793 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509792 merged by jenkins-bot:
[mediawiki/vendor@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Change 509793 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.4] Update cssjanus/cssjanus from 1.2.1 to 1.3.0
Mentioned in SAL (#wikimedia-operations) [2019-05-13T11:58:22Z] <reedy@deploy1001> Synchronized php-1.34.0-wmf.4/vendor/: T215746 (duration: 01m 30s)
Mentioned in SAL (#wikimedia-operations) [2019-05-13T11:59:33Z] <reedy@deploy1001> Synchronized php-1.34.0-wmf.4/composer.json: T215746 (duration: 00m 49s)