Page MenuHomePhabricator

Performance review for Assert library
Closed, ResolvedPublic

Description

In investigating T201801: Improve TitleValue, TitleFactory, TitleParser performance to be on-par with Title, one of the performance problems identified was the use of the Assert library. For example in TitleValue:

		Assert::parameterType( 'integer', $namespace, '$namespace' );
		Assert::parameterType( 'string', $dbkey, '$dbkey' );
		Assert::parameterType( 'string', $fragment, '$fragment' );
		Assert::parameterType( 'string', $interwiki, '$interwiki' );

		// Sanity check, no full validation or normalization applied here!
		Assert::parameter( !preg_match( '/^_|[ \r\n\t]|_$/', $dbkey ), '$dbkey',
			"invalid DB key '$dbkey'" );
		Assert::parameter( $dbkey !== '', '$dbkey', 'should not be empty' );

All of the type checks would be significantly faster if they were inlined instead.

[22:41:21] <TimStarling> it's obviously faster to use is_int() than Assert::parameterType(), look at the implementation of the latter

Usage in MediaWiki core: https://codesearch.wmflabs.org/core/?q=Assert%3A%3A&i=nope&files=&repos=

Requesting a performance review, mostly for how the library is used, and whether we should change the way we're using it.

Event Timeline

From the header of Assert.php:

Note that assertions evaluate expressions and add function calls, so using assertions may have a negative impact on performance when used in performance hotspots. The idea of this class is to have a neat tool for assertions if and when they are needed. It is not recommended to place assertions all over the code indiscriminately.

I remember doing some benchmarking when working on this class, and especially carefully arranging what Assert::hasType does. I believe there is not really anything to strip (but I will happily review any patch that aims to improve the performance of this library). The class does not really do that much: Some method calls, if, and string comparisons, that's about it. There are some loops, but I hope it's obvious that calling something like parameterElementType on an array needs to run a loop, as well as when one of the methods is called with multiple types (e.g. with 'int|string').

TL;DR: In performance hotspots, please replace Assert::… calls with whatever if ( !is_…( … ) ) is appropriate.

Requesting a performance review, mostly for how the library is used, and whether we should change the way we're using it.

If I recall correctly, we opted against using native PHP assertion because they were broken by design, relying on eval() to avoid expression evaluation when assertions were disabled. In PHP 7, assert is a language construct, and no longer has this problem, see http://php.net/manual/en/function.assert.php.

While it may be possible to improve the performance of the Assert class a bit, it should probably just not be used in performance critical code. While a performance review may be helpful, I recommend to instead revisit our decision to not use PHP's native assert, since the behavior seems much improved with PHP 7.

@Legoktm @tstarling Let us know if you need anything from the Performance team

@Legoktm @tstarling Let us know if you need anything from the Performance team

I added the tag because I was requesting a performance review, given that this appears to be a problem in a pretty hot path of code. I'm not sure whether it needs an explicit review from anyone on the Perf team given that Thiemo already did analysis/review on the library (correct me if I'm wrong).

Maybe it should go back to TechCom for a RfC re-evaluation as Daniel suggested?

Gotcha - wasn't clear that was the intent of the tag, we get added to lots of tickets :-)

@Krinkle can you and @Legoktm figure out the right disposition for this? Assigning to you for a decision, but no implication that you should be doing any work related to it.

For the case at hand, perhaps the assertions should just be replaced by scalar type hints, which have become available in php7: http://php.net/manual/de/migration70.new-features.php

[…] Thiemo already did analysis/review on the library (correct me if I'm wrong).

Yes, I did back in 2016 when I worked on Assert. Brief summary of my results:

  • I introduced Assert::nonEmptyString as a shortcut thats much faster than Assert::parameterType( 'string', $x, '$x' ) && $x !== ''.
  • In a similar fashion, many more fast performing shortcuts like this can be introduced, e.g. Assert::string( $x, '$x' ). On the other hand, such shortcuts barely have an advantage over the classic if ( !is_string( $x ) ) { throw new InvalidArgumentException( '$x must be a string' ); }. That's why I never bothered adding more features to Assert.
  • As the Assert class states, it was never meant to be used in performance hotspots. I think @daniel made a mistake in https://gerrit.wikimedia.org/r/205884. See my comment in T201801#4497755 for suggestions how to improve the problematic TitleValue.
  • Many assertions (no matter how implemented) will become obsolete with PHP 7 scalar type hints anyway.

My assumption was that a constructor is not a relevant performance hot spot, since the overhead of object creation and garbage collection is far greater than the overhead of the asserts. Perhaps that assumption was wrong.

Anyway, as mentioned in my other comment, my preference would be to just use strict scalar type hints.

I think adoption of PHP7 scalar type hints is out of scope of this ticket. There maybe concerns with those, or maybe not. From a quick look into them, it seems like that would make sense indeed.

[..] many more fast performing shortcuts like this can be introduced, e.g. Assert::string( $x, '$x' ). On the other hand, such shortcuts barely have an advantage over the classic if ( !is_string( $x ) ) { throw new InvalidArgumentException( '$x must be a string' ); }.

Agreed. For simple case like these, I would actually prefer an explicit check with throw. In my opinion, these are not worth the indirection and abstraction. Although I won't oppose continued use of those methods.

On the other hand, I do think we should avoid using native assert() in the form of assert( is_int( $x ) ); because if we're doing the check inline ourselves, might as do the throw too – thus providing a more useful exception, instead of the default AssertionError.

So while a short-cut method like Assert::parameterString() could be useful and have no performance concerns, those are also the easiest ones to cleanly do inline (per the above).

The problem is more with cases that need indirection, such as Assert::parameterType( 'string|int', $x, '$x' ). These require logic to map to a type to a check (e.g. int => is_int) with either a loop, or (as currently) with multiple in_array calls and other stuff. These would be much simpler if the caller performed the check directly, eg. Assert::example( is_string($x) || is_int($x), '$x', 'int|string' ). This is basically what the Assert::parameter($condition, ..) does.

It has the benefit of still being a single line of code (which is presumably a significant motivation for the library's existence), and still having a descriptive exception class and error message, but also puts the check directly in the calling code without any overhead.

That leaves the case of checking types for array elements. The type check for that could be optimised by removing support for mixed types, with a method for primitives, and a method for instanceof. But.. given the unavoidable loop, might not be worth optimising and instead should probably just be discouraged in general in most production code. Seems like something Phan could also help with in terms of static analysis, with run-time detection moved to the consumer of the array. Presumably somewhere the array will be iterated and consumed in someway where there will be a conditional or explicit type check still.

Krinkle triaged this task as Medium priority.Aug 15 2018, 12:05 AM
Krinkle claimed this task.
Krinkle edited projects, added Performance-Team; removed Performance-Team (Radar).