Page MenuHomePhabricator

PHP 7.2 is very slow on an allocation-intensive benchmark
Open, MediumPublic

Description

When parsing a large article locally on my laptop using Parsoid/PHP integrated with MW via the new REST API, the following parse times were observed:

EnvironmentTime (s)
Ubuntu packaged PHP 7.247
Compiled PHP 7.249
Compiled PHP 7.2 patched with quadrupled GC_ROOT_BUFFER_MAX_ENTRIES28
PHP 7.325
PHP 8 (git master) with JIT enabled23

In PHP 7.2, the GC root buffer is static in size, and collection is triggered when the buffer is full. In PHP 7.3, the GC buffer was made dynamic, with the size automatically tuned at runtime in order to meet a target of 100 objects collected per collection.

I suggest either upgrading to PHP 7.3 in production, or patching PHP 7.2 to have a larger value for GC_ROOT_BUFFER_MAX_ENTRIES.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 21 2019, 6:23 AM
diff --git a/Zend/zend_gc.c b/Zend/zend_gc.c
index 47236a934e..890b3e00f7 100644
--- a/Zend/zend_gc.c
+++ b/Zend/zend_gc.c
@@ -73,7 +73,7 @@
 #include "zend_API.h"
 
 /* one (0) is reserved */
-#define GC_ROOT_BUFFER_MAX_ENTRIES 10001
+#define GC_ROOT_BUFFER_MAX_ENTRIES 40001
 
 #define GC_HAS_DESTRUCTORS  (1<<0)

We maintain custom 7.2 packages anyway (based on the 7.2.x releases), we can cherrypick the patch for our next package update.

Cherry pick is not exactly the right word, I'm just proposing a temporary hack so that it will maybe work, whereas PHP 7.3 does a proper job of it. If we definitely want to increase GC_ROOT_BUFFER_MAX_ENTRIES on 7.2, I should do some more work first to figure out what the tradeoffs are and what the ideal value is. Invoking the GC less often presumably means using more memory. How much more will depend on the test case.

Ack, let me know when you have found a suitable value for GC_ROOT_BUFFER_MAX_ENTRIES, I have the 7.2.21 update for stretch-wikimedia mostly ready, we can add the patch to bump GC_ROOT_BUFFER_MAX_ENTRIES on top.

Joe triaged this task as Medium priority.Aug 22 2019, 10:12 AM
jijiki added a subscriber: jijiki.EditedAug 22 2019, 11:27 AM

We are on our way to finishing migration to PHP7, my opinion is to try the PHP7.2 bandaid rather than upgrading production to PHP7.3 right now, if we all agree of course.

Fun fact: the "gc address" of an object is packed into 14 bits, so increasing GC_ROOT_BUFFER_MAX_ENTRIES beyond 16383 causes corruption of the status bits, breaking the whole GC. In my previous tests, the GC probably didn't run at all. So I'm glad I investigated this further and we didn't just apply this patch.

PHP 7.3 rearranges the bitfield so that the address gets 20 bits. It's a binary compatibility break, and possibly a source compatibility break, and beyond the scope of what I wanted to do with a simple patch.

I think the other possible workaround is to just disable the GC. It makes no difference to memory usage in either Parsoid or the old parser, with this test case:


Of course it depends on the workload. For PHPUnit tests it makes a difference:

We could just wrap parsoid in gc_disable()/gc_enable().

The memory usage time series data was gathered using Excimer:

$excimer = new ExcimerTimer;
$excimer->setPeriod( 0.1 );
$startTime = microtime(true);
$excimer->setCallback( function () use ( $startTime ) {
	$usage = sprintf( "%.2f", memory_get_usage() / 1048576 );
	$interval = microtime( true ) - $startTime;
	$date = sprintf( "%.2f", $interval );
	file_put_contents(
		'/tmp/php-memory-usage',
		"$date $usage\n",
		FILE_APPEND );
} );
$excimer->start();

PHP 7.3 rearranges the bitfield so that the address gets 20 bits. It's a binary compatibility break, and possibly a source compatibility break, and beyond the scope of what I wanted to do with a simple patch.

I think the other possible workaround is to just disable the GC. It makes no difference to memory usage in either Parsoid or the old parser, with this test case:

That is interesting ... At least in Parsoid, with chunked parsing enabled, I would have thought memory for all the allocated tokens and PEG cache would have been freed up after each chunk. So, maybe some code is holding a ref to the chunk beyond what it should.

In any case, I expect that the DOM would be the biggest memory hog in Parsoid and beyond some temporary nodes and fragments, it is likely that nothing there would ever get freed. So, one GC run after the DOM is completely built would be sufficient to free up all token and PEG cache usage if it is possible to manually invoke the GC at a specific point in time. This manual invocation would be in the HTML5TreeBuilder.php code in the Parsoid codebase.

Tracing of TagTk::__destruct() shows that tokens are freed as it goes, they're not the problem. Disabling the GC causes the root buffer to no longer be maintained, so after you re-enable it, it'll be empty and doing a manual GC won't do anything. But we don't need to do a manual GC run anyway, since as I've documented, the memory usage is the same with it disabled. The high memory usage is due to live, uncollectable objects.

Tracing of TagTk::__destruct() shows that tokens are freed as it goes, they're not the problem. Disabling the GC causes the root buffer to no longer be maintained, so after you re-enable it, it'll be empty and doing a manual GC won't do anything.

Ok .. makes sense.

But we don't need to do a manual GC run anyway, since as I've documented, the memory usage is the same with it disabled. The high memory usage is due to live, uncollectable objects.

I was / am just surprised to see memory usage same as when disabled.

There should be a token for every node in the dom and will have data-parsoid info, and partial data-mw info, and all the other tokens that were temporarily allocated before being transformed to other ones. So, I would have expected *something* to show up in the usage graph when GC is disabled.

That implies that memory usage from tokens is substantially smaller compared to the DOM not to even show up in the usage graphs.

I suspect we are maintaining live references to the all the DOM fragments we create during the entire lifetime of the parse which is probably one explanation for why the DOM memory usage is overwhelmingly higher than token usage in this fashion. My hunch is the env object with all its live references to all doms ever created because of T215000: Fill gaps in PHP DOM's functionality. We added that live ref to every document (and fragment) we create and so we probably need a detach / discard method in env to release the fragments once they are merged into the parent document. @Arlolra thoughts?

Change 534555 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/services/parsoid@master] Disable the GC if running under PHP 7.2

https://gerrit.wikimedia.org/r/534555

Change 534555 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Disable the GC if running under PHP 7.2

https://gerrit.wikimedia.org/r/534555

Krinkle moved this task from Untriaged to Other on the PHP 7.2 support board.Sep 6 2019, 12:24 AM
Krinkle moved this task from Backlog to Wikimedia production on the PHP 7.3 support board.

Change 573378 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Release fragments once they're no longer needed

https://gerrit.wikimedia.org/r/573378

jijiki moved this task from Incoming 🐫 to Unsorted on the serviceops board.Aug 17 2020, 11:48 PM