Page MenuHomePhabricator

Parser::doBlockLevels performs poorly under HHVM
Closed, ResolvedPublic

Description

Parser::doBlockLevels() constructs regular expressions with unique strip markers. Each regular expression pattern is turned into a StaticString, which HHVM uses as a lookup key for the cached PCRE table. Since patterns with strip markers are unique by design, they are cache misses, and they get compiled and cached.

The results are:

  • The PCRE table fills up with garbage (patterns that are identical save for the strip markers).
  • Memory bloats with pattern StaticStrings.

Version: unspecified
Severity: normal

Details

Reference
bz72205

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:47 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz72205.

Change 167411 had a related patch set uploaded by Ori.livneh:
Use a fixed regex for StripState

https://gerrit.wikimedia.org/r/167411

Change 167411 merged by jenkins-bot:
Use a fixed regex for StripState

https://gerrit.wikimedia.org/r/167411

Change 167530 had a related patch set uploaded by Ori.livneh:
Re-use marker strings across requests

https://gerrit.wikimedia.org/r/167530

(In reply to Gerrit Notification Bot from comment #2)

Change 167411 merged by jenkins-bot:
Use a fixed regex for StripState

https://gerrit.wikimedia.org/r/167411

This patch was reverted in change Ic193abcff8c72b0c8b.

This was fixed by implementing an LRU cache for compiled PCRE patterns in HHVM.