Page MenuHomePhabricator

ECHO/includes/DiscussionParser.php: minor review of regexp usage
Open, Needs TriagePublic

Description

The current constant

const HEADER_REGEX = '^(==+)\h*([^=].*)\h*\1$';

would ignore headings such as

=== moep ==

(which should be equivalent to == = moep ==)

For this reason, i recommend to use

const HEADER_REGEX = '(?m:^(==+)\h*(.*?)\h*\1$)';

instead.

This has another advantage: The code

	// The \A means the regex must match at the beginning of the string.
	// This is slightly different than ^ which matches beginning of each
	// line in multiline mode.
	$startSection = preg_match( '/\A' . self::HEADER_REGEX . '/um', $content );

could be simply replaced by

	$startSection = preg_match( '/^' . self::HEADER_REGEX . '/u', $content );

Event Timeline

seth created this task.Jun 2 2019, 7:34 PM
Restricted Application added a project: Growth-Team. · View Herald TranscriptJun 2 2019, 7:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
SBisson added a subscriber: SBisson.Jun 4 2019, 4:43 PM

Hi @seth, thanks for your insight on those regular expressions. EchoDiscussionParser can always use some help ;)

If you're willing to create a patch based on your observations we'll gladly help get it reviewed and merged. Adding tests (or revising existing tests) is also very useful to ensure the changes don't have unfortunate consequences.

Change 515580 had a related patch set uploaded (by Seth; owner: seth):
[mediawiki/extensions/Echo@master] DiscussionParser: minor fix of HEADER_REGEX

https://gerrit.wikimedia.org/r/515580

kostajh added a subscriber: kostajh.

I've asked for tests on the patch since it proposes to handle new heading matches. When those are added we could move it back to code review.

kostajh moved this task from Inbox to Revisit on the Growth-Team board.Tue, Oct 15, 1:04 PM