Parsoid uses mb_strlen() on the input content for metrics and resource limits.
This is O(N) on the size of the input, because it needs to scan the entire string and parse out the UTF-8 codepoints.
However, it is more "fair" to non-latin-script wikis, who might overwise see their resource limits being up to 4x smaller than (say) enwiki enjoys.
On the gripping hand, it is inconsistent with the legacy parser, which uses strlen() for its resource limits, which means that the legacy parser can parse/save pages which then Parsoid can't open or vice-versa (depending on how the various limits are actually set).
At the very least we should probably do a single mb_strlen on the expanded input size and cache that, rather than recomputing the # of unicode codepoints multiple times. We might also figure out if we can change some of the legacy parser limits to use mb_strlen to allow Parsoid and the legacy parser to be more compatible.