Page MenuHomePhabricator

DiscussionTools is triggering false duplicate parse logging in each of its API calls
Closed, ResolvedPublic

Description

It seems since wmf.25 we have had duplicate parses being triggered from DT and since introduction it has caused 200,000 duplicate parses:
https://logstash.wikimedia.org/goto/a5de414eed5a1f4c7973c34598efc44a

image.png (320×971 px, 26 KB)

Here is an example: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-2022.03.11?id=V4aseH8BW3vXBwD5ymBi

Looking at the traces of both parses, it seems ApiDiscussionToolsPreview.php(54) and ApiDiscussionToolsPreview.php(76) being triggered, meaning that class doesn't cache inside?

While duplicate parses for discussion is not a big deal but this is making a lot of logspam preventing us from raising its log level.

Event Timeline

@matmarex might want to correct me, but I think this is a case where the two parses aren't actually redundant. The first parse happens on the text as provided by the user to see whether it's signed (we can't just check for ~~~~ because of templates), then the second parse happens to actually generate the preview if we needed to add a signature ourselves. So anytime this is happening we are parsing different wikitext.

@Ladsgroup I looked where the error's triggered in ParserObserver, and it happens any time there's multiple calls in a single pageload to parse a page with the same title, revision, and options. A reasonable thing to do here might be to suppress this error if the parse-request provided its own wikitext rather than fetching the stored wikitext for that revisionid, since I think the error is much more targeted at "you are repeatedly fetching-and-parsing a stored page"? Or perhaps include a hash of the provided-wikitext in the options hash?

Thanks. I don't know the details of DT but it would be great to avoid calling parser twice if it can be avoided and if it's not, there should be a way to avoid this logspam because it's drowning the actual problematic cases.

Change 791702 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] ParserObserver: Only report duplicate parse if the content is the same

https://gerrit.wikimedia.org/r/791702

matmarex renamed this task from DiscussionTools introduced a regression triggering duplicate parses in each of its API call to DiscussionTools is triggering false duplicate parse logging in each of its API calls.May 13 2022, 11:22 PM
matmarex claimed this task.

Change 791702 merged by jenkins-bot:

[mediawiki/core@master] ParserObserver: Only report duplicate parse if the content is the same

https://gerrit.wikimedia.org/r/791702

@Ladsgroup Thanks for the review, sorry it took me so long to get to this.

Moving to blocked, we should check the logs after the change is deployed to confirm the issue is fixed.