Page MenuHomePhabricator

Flow talk page on mediawiki.org takes 4 seconds to load
Closed, ResolvedPublic

Description

This isn't even taking front-end taking into account. The main HTTP response handled by PHP is taking 4 seconds to compose the initial response.

https://www.mediawiki.org/wiki/Manual_talk:Coding_conventions/JavaScript?forceprofile=1

100.00% 4171.235      1 - main()
 99.38% 4145.466      1 - MediaWiki::run
 99.38% 4145.279      1 - MediaWiki::main
 93.61% 3904.854      1 - MediaWiki::performRequest
 93.41% 3896.461      1 - MediaWiki::performAction
 93.37% 3894.558      1 - Flow\Actions\FlowAction::show
 93.37% 3894.547      1 - Flow\Actions\ViewAction::showForAction
 93.34% 3893.543      1 - Flow\Actions\FlowAction::showForAction
 92.95% 3877.242      1 - Flow\View::show
 86.07% 3590.069      1 - Flow\View::buildApiResponse
 77.17% 3218.802      2 - Flow\Block\TopicListBlock::renderApi
 68.63% 2862.840     55 - Flow\Formatter\RevisionFormatter::formatApi
 67.93% 2833.583      1 - Flow\Formatter\TopicListFormatter::formatApi
 67.92% 2833.219      1 - Flow\Formatter\TopicListFormatter::buildResult
 60.17% 2509.645    724 - Flow\RevisionActionPermissions::isAllowed
 45.94% 1916.432     55 - Flow\Formatter\RevisionFormatter::buildActions
 42.99% 1793.408   3021 - Flow\Data\BufferedCache::getMulti
 42.84% 1786.768   3021 - Flow\Data\BagOStuff\LocalBufferedBagOStuff::getMulti
 42.76% 1783.817   3117 - MemcachedPeclBagOStuff::getMulti
 42.57% 1775.711   3021 - Flow\Data\BagOStuff\BufferedBagOStuff::getMulti
 36.86% 1537.330   1247 - Flow\Data\ObjectLocator::findMulti
 35.04% 1461.736   1238 - Flow\Data\ObjectLocator::find
 34.12% 1423.096   1247 - Flow\Data\Index\FeatureIndex::findMulti
..

I can consistently reproduce this. Other non-Flow pages seem unaffected.

Details

Related Gerrit Patches:
mediawiki/extensions/Flow : masterKeep known results in memory

Event Timeline

Krinkle created this task.Oct 3 2015, 12:58 AM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptOct 3 2015, 12:58 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Paladox added a subscriber: Paladox.Oct 6 2015, 6:39 AM
Catrope triaged this task as High priority.Oct 7 2015, 11:31 PM
Catrope added a subscriber: Catrope.

I looked at the flamegraphs on xenon, but the hourly graph I looked at only had one sample for Flow\Actions\FlowAction::show, and today's daily only has 15. I think Flow traffic is probably too small compared to other read traffic to show up with a significant number. I'll see if I can reproduce locally and profile there.

Krinkle added a comment.EditedNov 25 2015, 1:23 AM

Current page load of https://www.mediawiki.org/wiki/Manual_talk:Coding_conventions/JavaScript when logged-in:

  • responseStart (First byte from the server): 5 seconds (5002ms)
  • domComplete: 5.6 seconds (5671ms)

This is from a repeat view, so after a refresh of the exact same page. That should hit ideal cache conditions and yet spends 5 to 6 seconds in PHP.

view-source:https://www.mediawiki.org/wiki/Manual_talk:Coding_conventions/JavaScript?forceprofile=1

100.00% 6131.759      1 - main()
 99.63% 6108.786      1 - MediaWiki::run
 99.62% 6108.633      1 - MediaWiki::main
 96.26% 5902.398      1 - MediaWiki::performRequest
 96.14% 5894.790      1 - MediaWiki::performAction
 96.10% 5892.843      1 - Flow\Actions\FlowAction::show
 96.10% 5892.838      1 - Flow\Actions\ViewAction::showForAction
 96.09% 5891.826      1 - Flow\Actions\FlowAction::showForAction
 95.91% 5880.688      1 - Flow\View::show
 92.68% 5682.717      1 - Flow\View::buildApiResponse
 86.55% 5307.269      2 - Flow\Block\TopicListBlock::renderApi
 70.15% 4301.131     55 - Flow\Formatter\RevisionFormatter::formatApi
 69.79% 4279.298      1 - Flow\Formatter\TopicListFormatter::formatApi
 69.76% 4277.230      1 - Flow\Formatter\TopicListFormatter::buildResult
 66.63% 4085.716    724 - Flow\RevisionActionPermissions::isAllowed
 56.67% 3474.880   1247 - Flow\Data\ObjectLocator::findMulti
 54.71% 3354.643   1247 - Flow\Data\Index\FeatureIndex::findMulti
 50.62% 3104.048   1238 - Flow\Data\ObjectLocator::find
 50.46% 3094.003     55 - Flow\Formatter\RevisionFormatter::buildActions
 42.09% 2580.964   1247 - Flow\Data\Index\FeatureIndex::backingStoreFindMulti
 38.52% 2361.709   2821 - DatabaseBase::select
 35.70% 2189.056   2821 - DatabaseBase::query
 34.33% 2104.995    625 - Flow\Collection\LocalCacheAbstractCollection::getLastRevision
 32.99% 2023.058    625 - Flow\Collection\LocalCacheAbstractCollection::getAllRevisions
 32.94% 2019.695    625 - Flow\Collection\AbstractCollection::getAllRevisions
 30.40% 1864.054    632 - Flow\Data\Storage\RevisionStorage::findMulti
 28.42% 1742.750   2833 - DatabaseMysqli::doQuery
 28.07% 1721.416    616 - Flow\RevisionActionPermissions::getRoot
 24.66% 1512.035    616 - Flow\Collection\AbstractCollection::getBoardWorkflow
 24.20% 1483.870    610 - Flow\Collection\PostCollection::getBoardWorkflowId
 22.31% 1368.210   2390 - Flow\Data\BufferedCache::getMulti
 22.24% 1363.545   2390 - Flow\Data\BagOStuff\LocalBufferedBagOStuff::getMulti
 22.13% 1356.729   2390 - Flow\Data\BagOStuff\BufferedBagOStuff::getMulti
 20.42% 1252.081   2487 - MemcachedPeclBagOStuff::getMulti
 19.08% 1170.187    632 - Flow\Data\Storage\RevisionStorage::fallbackFindMulti
 19.05% 1168.285    814 - Flow\Data\Storage\RevisionStorage::findInternal
 17.81% 1091.863   1429 - section.DatabaseBase::query-master
 14.76% 905.106   1397 - section.DatabaseBase::query
 14.34% 879.409      1 - Flow\Formatter\TopicListQuery::getResults
 12.10% 741.717   1140 - Flow\Repository\TreeRepository::findRootPaths
 11.77% 721.515   1139 - Flow\Repository\TreeRepository::findRootPath
 11.47% 703.135   1085 - Flow\Collection\PostCollection::getWorkflowId
 11.45% 701.990    615 - Flow\Data\Storage\BasicDbStorage::findMulti

That should hit ideal cache conditions and yet spends 5+ seconds in PHP.

Yeah. In order to not block multi-DC, we stopped backfilling the Memcached cache from the DB (c6b2270811e6e01ae51c55d31614c3026aab701d). You may have hit the (now much more common) scenario of using the slave for everything.

However, it's kind of a worst-case scenario, since the code was written based on using these Memcached indices (which also affects how the DB queries are written). We're working to get rid of the Memcached code entirely (gradually, see https://gerrit.wikimedia.org/r/#/c/249402/), and then/simultaneously rewrite the DB queries.

We are aware it's a serious problem, and the code is in flux.

Change 255365 had a related patch set uploaded (by Matthias Mullie):
Keep known results in memory

https://gerrit.wikimedia.org/r/255365

matthiasmullie added a comment.EditedNov 25 2015, 12:32 PM

We already knew Flow permission checks are suboptimal (euphemism) & generate a lot of queries. I had no idea these were being looped multiple times for the same revisions though (to test different actions).
Above patch keeps those query results in memory, lowering the total amount of queries by a factor of 3/4 on my machine.

Change 255365 merged by jenkins-bot:
Keep known results in memory

https://gerrit.wikimedia.org/r/255365

matthiasmullie closed this task as Resolved.Dec 16 2015, 7:04 PM
matthiasmullie claimed this task.

I'm closing this: I currently get a pretty consistent 2s for initial page request. While still far from ideal, it's much less than the 4-5s it used to be.

There already are tickets to refactor the other things currently causing slowness: