Page MenuHomePhabricator

Investigate source of performance improvements in Parsoid/PHP over Parsoid/JS
Closed, DeclinedPublic

Description

Parsoid/PHP endpoints (w2html and html2wt) seem about 2x better than Parsoid/JS. The specific factor varies depending on which bucket we are looking at, mean, p50, p75, p95, p99, but 2x is a good number to work with. We saw this during benchmarking ( T232182#5592240 ) which also held up in production after Parsoid/PHP was rolled out everywhere.

However, we don't have a clear sense of what has led to this performance benefits. We have a number of theories about factors that could have contributed to this as below:

  • C DOM in PHP via libxml
  • PHP typehints
    • But, V8 has a JIT - maybe we’re really bad about invalidating inline caches / changing object shapes when in JS
  • Reduced network traffic and I/O wait times
    • And, while Parsoid/JS has lot of buffering and async logic, maybe all that co-ordination adds to overheads
  • Parsoid/PHP has its GC turned off because it didn’t matter wrt memory growth
    • But, this only applies wt2html, not html2wt where GC is still enabled
  • Improved caching across the entire request in Parsoid/PHP vs. caching benefits within a single API request batch in Parsoid/JS (since there is no state across API request batches)

However, we haven't done any investigation or analysis about the speedups.

A separate issue for html2wt endpoints for p95 numbers is as follows (over a 24-hour window from December):

  • Parsoid/JS: Init (260ms) + DOMDiff (370ms) + Selser (860ms) = 1.5s. But, total html2wt time is 2.8s. So, about 1.3s is unaccounted for.
  • Parsoid/PHP: Init (140ms) + DOMdiff (450ms) + Selser (560ms) = 1.15s. But, total html2wt time is 1.3s. So, about 150ms is unaccounted for.

So, at least for html2wt, this reveals that when looking at p95 values, Parsoid/JS has an unaccounted gap of about 1.3s. This is also worth investigating separately.

Someone who has free time and the interest should dig into this.