Page MenuHomePhabricator

Faster logic for Abuse Filter parser
Closed, ResolvedPublic

Description

The attached patch improves the execution speed of AbuseFilterParser::nextToken through a series of small changes.

The most significant impact comes from modifying the application of regex to focus on the immediate offset and not look downstream unnecessarily. It also stops radixRegex from giving empty string matches.

This patch preserves all current behavior and is transparent to the user.

Benchmarking done with function evaluation and variable lookup hacked off, saw a ~20% improvement in the parsing speed for rules after applying this patch.


Version: unspecified
Severity: enhancement

Details

Reference
bz18147

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:36 PM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz18147.

Faster logic for parser

Retrying to upload patch...

Attached:

The use of substr( $code, $offset ) is slow. Instead, we should be using the /A modifier (which I thought I was, maybe I didn't commit properly).

Calling preg_match with an offset only matches the beginning of the string flag if the offset is actually set to 0. This is annoying behavior, but I think the only way to force a beginning of string match from preg_match is actually to send it a truncated string.

(In reply to comment #3)

Calling preg_match with an offset only matches the beginning of the string flag
if the offset is actually set to 0. This is annoying behavior, but I think the
only way to force a beginning of string match from preg_match is actually to
send it a truncated string.

I know that, but as I said in my previous comment, you can use the /A modifier to do what you want.

http://au2.php.net/manual/en/reference.pcre.pattern.modifiers.php

Oh, neat. I learned regex in Python, and I'm pretty confident Python doesn't have that flag. By all means, that looks even better.

Done with a similar, but independent patch in r48806.