Page MenuHomePhabricator

EchoDiscussionParser is slow, causes timeouts
Closed, DuplicatePublic

Description

The new PHP time limit has brought attention to the fact that EchoDiscussionParser is slow. EchoDiscussionParser stands out as a common cause of PHP timeouts. For example, for https://www.wikidata.org/w/index.php?title=User:Wikidelo/books/To_fix&oldid=743563339 , EchoDiscussionParser took 211 seconds:

[0443][tstarling@mwmaint1001:~]$ mwscript eval.php --wiki=wikidatawiki
> $rev = Revision::newFromId(743563339);
> $t = microtime(true); EchoDiscussionParser::getChangeInterpretationForRevision($rev); print microtime(true)-$t;
211.43595385551

For comparison, the MediaWiki parser takes 3.8 seconds to parse this page.

The backtrace on timeout was:

#0 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Echo/includes/DiscussionParser.php(682): NO_FUNCTION_GIVEN()
#1 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Echo/includes/DiscussionParser.php(598): EchoDiscussionParser::getFullSection()
#2 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Echo/includes/DiscussionParser.php(467): EchoDiscussionParser::interpretDiff(array, string, Title)
#3 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Echo/includes/DiscussionParser.php(37): EchoDiscussionParser::getChangeInterpretationForRevision(Revision)
#4 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Echo/includes/EchoHooks.php(525): EchoDiscussionParser::generateEventsForRevision(Revision, boolean)
#5 /srv/mediawiki/php-1.32.0-wmf.20/includes/deferred/MWCallableUpdate.php(34): Closure$EchoHooks::onPageContentSaveComplete()
#6 /srv/mediawiki/php-1.32.0-wmf.20/includes/deferred/DeferredUpdates.php(268): MWCallableUpdate->doUpdate()
#7 /srv/mediawiki/php-1.32.0-wmf.20/includes/deferred/DeferredUpdates.php(214): DeferredUpdates::runUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
#8 /srv/mediawiki/php-1.32.0-wmf.20/includes/deferred/DeferredUpdates.php(134): DeferredUpdates::execute(array, string, integer)
#9 /srv/mediawiki/php-1.32.0-wmf.20/includes/MediaWiki.php(914): DeferredUpdates::doUpdates(string)
#10 /srv/mediawiki/php-1.32.0-wmf.20/includes/MediaWiki.php(734): MediaWiki->restInPeace(string, boolean)
#11 [internal function]: Closure$MediaWiki::doPostOutputShutdown()
#12 {main}

This is request ID W5XA4gpAAEsAAD5nA4IAAAAF.

Event Timeline

Restricted Application added a project: Growth-Team. · View Herald TranscriptSep 10 2018, 4:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The diff in question has a large number of diff ops, but only one section. So for every diff op, getSectionStartIndex() and getSectionEndIndex() have to run a regex match on almost every line in the page, looking for headings. I think a better algorithm would be to have a cache giving the section start index for each line. If a line is not in the cache, then do the preg_match() on the current line, and if it is not a section header, repeat for the previous line with a new cache fetch. And do the same for getSectionEndIndex() looking forwards.

In other words, imagine getSectionStartIndex($i) as the recursive function

function getSectionStartIndex( $i ) {
   if ( $i == 0 ) {
      return 0;
   } elseif ( getSectionCount( $lines[$i] ) ) {
      return $i;
   } else {
      return getSectionStartIndex( $i - 1 );
   }
}

And then memoize that function and then implement it iteratively rather than recursively.

I appreciate that it's difficult to do caching of any kind in a parser implemented as a collection of static functions. It may be easier to convert it to non-static before you do this.

Krinkle moved this task from Inbox to Radar on the Performance-Team board.Sep 10 2018, 5:55 PM
Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.

Change 462754 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Echo@master] Use \h instead of \s in regular expressions

https://gerrit.wikimedia.org/r/462754

Change 462759 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Echo@master] Remove expensive regular expression that doesn't have any effect

https://gerrit.wikimedia.org/r/462759

Change 462759 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Remove expensive regular expression that doesn't have any effect

https://gerrit.wikimedia.org/r/462759

Change 465762 had a related patch set uploaded (by Krinkle; owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.24] Remove expensive regular expression that doesn't have any effect

https://gerrit.wikimedia.org/r/465762

Change 465762 merged by jenkins-bot:
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.24] Remove expensive regular expression that doesn't have any effect

https://gerrit.wikimedia.org/r/465762

Change 466205 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Echo@master] Replace two regular expressions with cheap string manipulations

https://gerrit.wikimedia.org/r/466205

Change 462754 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Use \h instead of \s in regular expressions

https://gerrit.wikimedia.org/r/462754

Change 466205 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Replace two regular expressions with cheap string manipulations

https://gerrit.wikimedia.org/r/466205

Krinkle moved this task from Perf issue to Watching on the Performance-Team (Radar) board.