Generating ParserOutput is expensive, so it should be deferred as much as possible, and should only be done on demand. When savin an edit, parsing could perhaps even be delayed until after the HTTP response has been sent (though that would mean using PoolCounter for parsing).
At present hower, the ParserOutput needs to be generated before the PageContentSave hook is called, otherwise (at least) ApiFlowEditHeaderTest::testCache breaks. This became apparent while refactoring WikiPage::doEditContent for T174038, see https://gerrit.wikimedia.org/r/c/405015/78/includes/Storage/PageUpdater.php#595
Analysis, per @Anomie:
I suspect that what's breaking is this:
The old version of WikiPage::doEditContent() called prepareContentForEdit() which generated the ParserOutput right then, so when doEditUpdates() gets called from the DeferredUpdate scheduled by WikiPage::doCreate() there's no need to parse. I note there's a comment there that says "Get the pre-save transform content and final parser output".
The new version of WikiPage::doEditContent() makes a PageUpdater and calls its createRevision(), which calls DerivedPageDataUpdater::prepareContent() and PageUpdater::doCreate() without ever having to actually generate a ParserOutput. Thus, when DerivedPageDataUpdater::doUpdates() is called from the DeferredUpdate scheduled by PageUpdater::doCreate(), it does find that it needs to parse at that point.
And the order of operations in that Flow test is presumably:
- Create a page with a call to WikiPage::doEditContent(), in a way that somehow avoids processing the DeferredUpdate.
- Set up the "no set!" mock cache in Flow\Tests\Api\ApiTestCase::expectCacheInvalidate()
- Then, during the course of doing that test, a $db->commit() results in the DeferredUpdates being run.