Page MenuHomePhabricator

libpcre-related performance opportunities
Closed, DeclinedPublic

Description

Both HHVM and PHP7 use PCRE's JIT compiler to compile regular expressions into efficient byte code. Compilation is pretty expensive, and PCRE doesn't cache anything for you: it's up to the caller to decide whether (and how) to store and reuse the bytecode and other JIT data for a particular pattern. So while PHP7 and HHVM use the same library to execute patterns, each runtime has its own cache implementation.

As far as I know, we've never attempted to analyze the efficiency of the pattern cache and tune it for MediaWiki. There may be some low-hanging performance optimization opportunities there, since MediaWiki spends a lot of resources on regexp execution.

PCRE also allocates a block of 32K on the heap to use as a stack for its regexp virtual machine. The size of the stack is tunable, and in the past there was some discussion on the PHP internals list about increasing it, driven partly by this bug: https://bugs.php.net/bug.php?id=70110. The discussion died out, partly due to lack of real-world data. So that's another ripe target for instrumentation and tuning, in my opinion.

AFAIK there's no way to instrument this from PHP code. We'd either have to modify the runtime to collect and export PCRE JIT metrics. Alternately this could be done using 'perf' to trace PCRE function calls. On recent-ish Intel processors this can be done using LBR (https://lwn.net/Articles/680985/) and the runtime overhead is low enough that you can sample in prod.

All of the above assumes the WMF doesn't disable the PCRE JIT. If it's disabled then obviously the first thing to look into is whether turning it on could be beneficial.

Event Timeline

So, while jit is enabled by default on PHP 7.2 (pcre.jit is 1 by default), I don't see how perf could help in knowing how full the JIT VM is (which is what we want to measure probably).

It should be possible to make a patch to the PHP code to get the internal value, or simply to make that value tunable by ourselves.

HHVM does caching of PCRE expressions, which for us was quite a problem given the amount of dynamic regexes we have (to the point Tim had to implement LRU eviction in their code).

According to https://externals.io/message/98368#98398 php-fpm has a per-process cache that does FIFO eviction, which is probably worse-performing for us, given the amount of dynamic regexes we have to use.

It would indeed be interesting to measure the cache hit ratio in both HHVM and PHP 7 for our workload - I'm pretty sure the numbers are going to be bad, but again instrumenting this will need some perf sorcery - and time to read PHP's code.

A last note: we used Sury's prebuilt packages until now, but given we're going to rebuild them ourselves per T216712, we could easily patch php-fpm to report more data we're interested in.

@Krinkle I'm going to untag us, but let me know if there is CPT specific work needed.

I did perf record -d on a random PHP 7.4 appserver worker and then perf report -s dso. The time spent in PCRE JIT is probably accounted here as "perf-13876.map". It's 0.7% of CPU time, which doesn't seem like low-hanging fruit.

68.35%  php-fpm7.4            
10.39%  [kernel.kallsyms]     
 7.11%  liblua5.1.so.0.0.0    
 3.95%  opcache.so            
 3.84%  libc-2.28.so          
 0.82%  apcu.so               
 0.70%  perf-13876.map        
 0.65%  json.so               
 0.63%  libxml2.so.2.9.4      
 0.57%  libpcre2-8.so.0.7.1   
 0.56%  luasandbox.so         
 0.54%  mbstring.so           
 0.44%  memcached.so          
 0.35%  libmemcached.so.11.0.0
 0.31%  mysqlnd.so            
 0.19%  libz.so.1.2.11        
 0.12%  libicuuc.so.63.1      
 0.07%  mysqli.so             
 0.07%  dba.so                
 0.06%  libpthread-2.28.so    
 0.06%  [vdso]                
 0.04%  excimer.so            
 0.04%  libicui18n.so.63.1    
 0.04%  libgmp.so.10.3.2      
 0.03%  libcurl.so.4.5.0      
 0.03%  phar.so               
 0.02%  dom.so                
 0.01%  librt-2.28.so         
 0.01%  ld-2.28.so            
 0.01%  sockets.so            
 0.00%  wikidiff2.so

Given the small percentage, closing for now from list of future goals.

I do suggest that as part of T255502: Goal: Save Timing median back under 1 second and T237708 when we analyze the subset of requests that parse and save edits, we take another look to see if PCRE stands out more there. If it does, we might take another look here to see what we can improve.