Page MenuHomePhabricator

MediaWiki needlessly queries page data twice on page views
Open, NormalPublic

Description

When debugging something unrelated, I noticed that a web request typically queryies the same fields twice for the wiki page associated with it.

WikiPage::pageData
SELECT  page_id,page_namespace,page_title, page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len, page_content_model,page_lang
FROM `page`
WHERE page_namespace = '#' AND page_title = '#######'
LIMIT 1  

master: false
trace:
#0 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1181): Wikimedia\Rdbms\Database->doProfiledQuery()
#1 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1682): Wikimedia\Rdbms\Database->query()
#2 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1773): Wikimedia\Rdbms\Database->select()
#3 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(421): Wikimedia\Rdbms\Database->selectRow()
#4 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(440): WikiPage->pageData()
#5 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(478): WikiPage->pageDataFromTitle()
#6 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(590): WikiPage->loadPageData()
#7 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(631): WikiPage->exists()
#8 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(269): WikiPage->getContentModel()
#9 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(256): WikiPage->getContentHandler()
#10 /srv/mediawiki/php-1.32.0-wmf.24/includes/actions/Action.php(98): WikiPage->getActionOverrides()
#11 /srv/mediawiki/php-1.32.0-wmf.24/includes/actions/Action.php(156): Action::factory()
#12 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(155): Action::getActionName()
#13 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(785): MediaWiki->getAction()
#14 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(525): MediaWiki->main()
#15 /srv/mediawiki/php-1.32.0-wmf.24/index.php(42): MediaWiki->run()
Title::newFromID
SELECT  page_namespace,page_title,page_id,page_len,page_is_redirect,page_latest,page_content_model,page_lang  FROM `page`    WHERE page_id = '#####'  LIMIT 1  

master: false
trace:
#1 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1682): Wikimedia\Rdbms\Database->query()
#2 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1773): Wikimedia\Rdbms\Database->select()
#3 /srv/mediawiki/php-1.32.0-wmf.24/includes/Title.php(434): Wikimedia\Rdbms\Database->selectRow()
#4 /srv/mediawiki/php-1.32.0-wmf.24/includes/Storage/RevisionStore.php(337): Title::newFromID()
#5 /srv/mediawiki/php-1.32.0-wmf.24/includes/Storage/RevisionStore.php(1749): MediaWiki\Storage\RevisionStore->getTitle()
#6 /srv/mediawiki/php-1.32.0-wmf.24/includes/Storage/RevisionStore.php(2107): MediaWiki\Storage\RevisionStore->newRevisionFromRow()
#7 /srv/mediawiki/php-1.32.0-wmf.24/includes/Storage/RevisionStore.php(1508): MediaWiki\Storage\RevisionStore->loadRevisionFromConds()
#8 /srv/mediawiki/php-1.32.0-wmf.24/includes/Revision.php(134): MediaWiki\Storage\RevisionStore->getRevisionByTitle()
#9 /srv/mediawiki/php-1.32.0-wmf.24/includes/actions/RawAction.php(167): Revision::newFromTitle()
#10 /srv/mediawiki/php-1.32.0-wmf.24/includes/actions/RawAction.php(134): RawAction->getRawText()
#11 /srv/mediawiki/php-1.32.0-wmf.24/includes/actions/FormlessAction.php(43): RawAction->onView()
#12 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(501): FormlessAction->show()
#13 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(294): MediaWiki->performAction()
#14 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(868): MediaWiki->performRequest()
#15 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(525): MediaWiki->main()
#16 /srv/mediawiki/php-1.32.0-wmf.24/index.php(42): MediaWiki->run()

In this case, the request was for index.php?action=raw.

Seems like this the first query from WikiPage should've populated the LinkCache used by Title, given the latter only needs a subset of the same information.

Details

Event Timeline

Krinkle created this task.Oct 9 2018, 1:01 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 9 2018, 1:01 AM
Reedy updated the task description. (Show Details)Oct 9 2018, 11:55 AM
jlinehan added a subscriber: jlinehan.EditedNov 14 2018, 8:29 PM

Seems to be a problem with the RevisionStore implementation of getRevisionByTitle. A LinkTarget gets passed in, but is not getting passed on to loadRevisionFromConds/newRevisionFromRow, even though this is supported.

Passing the Title up the call stack is probably not the Right Way To Do This, but somebody seems to have intended this use case, and doing so prevents Title::newFromID being called, stopping the extra query.

diff of debug log:

26,27d25
< [DBQuery] wiki SELECT /* Title::newFromID 10.0.2.2 */  page_namespace,page_title,page_id,page_len,page_is_redirect,page_latest,page_content_model  FROM `page`    WHERE page_id = '1'  LIMIT 1  
< [DBQuery] SELECT  page_namespace,page_title,page_id,page_len,page_is_redirect,page_latest,page_content_model  FROM `page`    WHERE page_id = '1'  LIMIT 1  
33a32
>

Seems this got lost in translation when @daniel split Revision into RevisionStore and RevisionRecord back in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/374077 -- the old static methods do not accept a Title argument (perhaps do not need to), and the option was added in the new code, but not wired up.

In order to pass the object on safely, the type hinting for getRevisionByTitle will need to change from LinkTarget to Title, because newRevisionFromRow expects a Title, not just a LinkTarget, and calls methods outside the LinkTarget interface. I think the motivation for supporting LinkTarget was that the conds array could be built using its methods alone, but once it gets passed on, it actually needs to be a Title, and there is no fast branch here since all roads lead to newRevisionFromRow. Also, the method is called getRevisionByTitle.

However, I will have to comb through and see whether anybody is passing a non-Title LinkTarget to this function before finalizing a patch.

Update
Simplest/safest solution here is just to call Title::newFromLinkTarget until the LinkTarget abstraction is fully enforced -- at least it's an obvious thing to grep for later. I'm going to do this for the patch.

daniel added a comment.EditedNov 15 2018, 3:40 PM

Simplest/safest solution here is just to call Title::newFromLinkTarget until the LinkTarget abstraction is fully enforced -- at least it's an obvious thing to grep for later. I'm going to do this for the patch.

That's probably the way to go, yes.

I'l like to avoid more type hints against Title, since Title is a terrible kitchen sink, and I hope to get rid of it in the not-too-far future. Then again, narrowing a type hint from Title to LinkTarget later should not be a problem, since all Titles are LinkTargets. Changing it to Title now could in theory break callers, but in reality, we still use Title objects nearly everywhere.

Note how this ties in with T208776: Introduce PageIdentity to be used instead of Title.

Changing it to Title now could in theory break callers, but in reality, we still use Title objects nearly everywhere.

Indeed, from a quick inspection, it seemed the method is only being called with a Title, but I realized that was missing the point.

Title is a terrible kitchen sink, and I hope to get rid of it in the not-too-far future.

Excitement

Title is a terrible kitchen sink, and I hope to get rid of it in the not-too-far future.

Excitement

Indeed... for another angle on this, see T208764: Remove cyclic dependency between Title and User classes

Change 473769 had a related patch set uploaded (by Jason Linehan; owner: Jason Linehan):
[mediawiki/core@master] Fully utilize LinkTarget passed to getRevisionByTitle

https://gerrit.wikimedia.org/r/473769

jlinehan added a comment.EditedNov 16 2018, 5:19 PM

Note this patch is currently failing because there is a Title cache hit when ApiMoveTest calls Title::newFromText after moving a page. The cached Title object will have the pre-move pageId, when in fact we want it to have the pageId of the newly-created redirect page.

Before this patch, the Title object was being re-instantiated and so this was not noticed, however now it results in an exception thanks to a sanity check in RevisionStoreRecord's constructor. Note that this is a pre-existing bug in the Title cache, which assumes that Titles are immutable when in fact they can be mutated by (at least) MovePage.

There are a few different solutions to this, none of them very palatable, so this is on hold for now until I have a few conversations.

Change 474726 had a related patch set uploaded (by Jason Linehan; owner: Jason Linehan):
[mediawiki/core@master] Calling $x->resetArticleID() now removes $x from the Title cache

https://gerrit.wikimedia.org/r/474726

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

Change 473769 had a related patch set uploaded (by Krinkle; owner: Jason Linehan):
[mediawiki/core@master] Fully utilize LinkTarget passed to getRevisionByTitle

https://gerrit.wikimedia.org/r/473769

Change 473769 merged by jenkins-bot:
[mediawiki/core@master] Fully utilize LinkTarget passed to getRevisionByTitle

https://gerrit.wikimedia.org/r/473769

Krinkle added a subscriber: Simetrical.

Change 474726 had a related patch set uploaded (by Jason Linehan; owner: Jason Linehan):
[mediawiki/core@master] Calling $x->resetArticleID() now removes $x from the Title cache
https://gerrit.wikimedia.org/r/474726

@Simetrical This might be of interest to you.

Tagging for clinic duty to move this forward

WDoranWMF added a subscriber: WDoranWMF.EditedSep 4 2019, 2:53 PM

@Krinkle what's the severity level of this task?

There is currently more database roundtrip latencies involved on every page view than we need. In particular, one of the very simple queries is run twice. This isn't unheard of, but this particular one is explicitly meant to be cached by the LinkCache class service in PHP. The fact that is isn't caught there, suggests there might be a bug which could be part of a larger symptom that if fixed could also improve various other issues we may not have found.

I see it as part of the long-term work to making "simple" page views do as little work as possible (which blocks things like page composition where editor page views would become comparable to those of readers). Part of that work is identifying logic that isn't just slow (but easily cacheable), but might actually obscure a hidden dependency that bypasses cache for a reason.

Given this bypasses LinkCache, that might be such a case, in which case I'd like to know more about it. Based on previous work from @daniel and @jlinehan that already started on this last year, it looks like we haven't found any reason it can't be part of the regular title query cache. But it got stuck due to some debt in the unit tests. I don't know if thats been resolved, otherwise it might just need CR.

daniel triaged this task as Normal priority.Sep 30 2019, 2:57 PM
mobrovac added a subscriber: mobrovac.

@jlinehan your patch needs a manual rebase. Once you do that and address the comments, please move this task back to the External Review Needed column on the Clinic Duty board and we'll take another look.

mobrovac removed a subscriber: mobrovac.Wed, Nov 13, 2:48 PM