Page MenuHomePhabricator

Investigation: Parser save hook handler does master writes in GETs
Closed, ResolvedPublic1 Story Points

Description

I noticed the number of enwiki master connections was higher according to tendril.wm.org that before. I see in the logs:

2015-06-22 10:28:17 mw1171 frwiki DBPerformance INFO: [GET] Expectation (masterConns <= 0) by MediaWiki::main not met:
[connect to 10.64.16.12 (frwiki)]
TransactionProfiler.php line 307 calls wfBacktrace()
TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated()
LoadBalancer.php line 550 calls TransactionProfiler->recordConnection()
ConsistentReadConnectionManager.php line 92 calls LoadBalancer->getConnection()
ConsistentReadConnectionManager.php line 117 calls Wikibase\Client\Store\Sql\ConsistentReadConnectionManager->getWriteConnection()
SqlUsageTracker.php line 131 calls Wikibase\Client\Store\Sql\ConsistentReadConnectionManager->beginAtomicSection()
UsageUpdater.php line 81 calls Wikibase\Client\Usage\Sql\SqlUsageTracker->trackUsedEntities()
DataUpdateHookHandlers.php line 173 calls Wikibase\Client\Store\UsageUpdater->addUsagesForPage()
DataUpdateHookHandlers.php line 112 calls Wikibase\Client\Hooks\DataUpdateHookHandlers->doParserCacheSaveComplete()
Hooks.php line 204 calls Wikibase\Client\Hooks\DataUpdateHookHandlers::onParserCacheSaveComplete()
ParserCache.php line 280 calls Hooks::run()
PoolWorkArticleView.php line 151 calls ParserCache->save()
PoolCounterWork.php line 123 calls PoolWorkArticleView->doWork()
Article.php line 674 calls PoolCounterWork->execute()
ViewAction.php line 44 calls Article->view()
MediaWiki.php line 413 calls ViewAction->show()
MediaWiki.php line 291 calls MediaWiki->performAction()
MediaWiki.php line 634 calls MediaWiki->performRequest()
MediaWiki.php line 431 calls MediaWiki->main()
index.php line 41 calls MediaWiki->run()
index.php line 3 calls include()

2015-06-22 08:28:17 mw1185 elwiki DBPerformance INFO: [GET] Expectation (writes <= 0) by MediaWiki::main not met:
query-m: UPDATE wbc_entity_usage SET eu_touched = 'X')) [TRX#fa6dd70f8c6b]
TransactionProfiler.php line 307 calls wfBacktrace()
TransactionProfiler.php line 200 calls TransactionProfiler->reportExpectationViolated()
Database.php line 1188 calls TransactionProfiler->recordQueryCompletion()
Database.php line 2181 calls DatabaseBase->query()
EntityUsageTable.php line 97 calls DatabaseBase->update()
SqlUsageTracker.php line 144 calls Wikibase\Client\Usage\Sql\EntityUsageTable->touchUsages()
UsageUpdater.php line 81 calls Wikibase\Client\Usage\Sql\SqlUsageTracker->trackUsedEntities()
DataUpdateHookHandlers.php line 173 calls Wikibase\Client\Store\UsageUpdater->addUsagesForPage()
DataUpdateHookHandlers.php line 112 calls Wikibase\Client\Hooks\DataUpdateHookHandlers->doParserCacheSaveComplete()
Hooks.php line 204 calls Wikibase\Client\Hooks\DataUpdateHookHandlers::onParserCacheSaveComplete()
ParserCache.php line 280 calls Hooks::run()
PoolWorkArticleView.php line 151 calls ParserCache->save()
PoolCounterWork.php line 123 calls PoolWorkArticleView->doWork()
Article.php line 674 calls PoolCounterWork->execute()
ViewAction.php line 44 calls Article->view()
MediaWiki.php line 413 calls ViewAction->show()
MediaWiki.php line 291 calls MediaWiki->performAction()
MediaWiki.php line 634 calls MediaWiki->performRequest()
MediaWiki.php line 431 calls MediaWiki->main()
index.php line 41 calls MediaWiki->run()
index.php line 3 calls include()

This really should use queuing or something (which also avoids tying page views up with a SPOF).

Details

Related Gerrit Patches:
mediawiki/extensions/Wikibase : wmf/1.26wmf13On page views, update usage tracking via the job queue.
mediawiki/extensions/Wikibase : wmf/1.26wmf16On page views, update usage tracking via the job queue.
mediawiki/extensions/Wikibase : masterOn page views, update usage tracking via the job queue.

Event Timeline

aaron created this task.Jun 22 2015, 9:57 PM
aaron raised the priority of this task from to Needs Triage.
aaron updated the task description. (Show Details)
aaron added a subscriber: aaron.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 22 2015, 9:57 PM
aaron updated the task description. (Show Details)Jun 22 2015, 9:57 PM
aaron set Security to None.
Tobi_WMDE_SW renamed this task from Parser save hook handler does master writes in GETs to Investigation: Parser save hook handler does master writes in GETs.Jun 30 2015, 1:19 PM
Tobi_WMDE_SW added a subscriber: Tobi_WMDE_SW.

We are going to look into that and see how bad this is.

Tobi_WMDE_SW edited a custom field.Jun 30 2015, 2:07 PM
hoo added a subscriber: hoo.Jul 1 2015, 12:22 AM

I think we should only do that if the data is not there before... that only happens once per page after the initial entity usage deploy.

To achieve that we should compare the MAX( eu_touched ) (from slave) with the touched timestamp we get from the ParserOutput or Title given to us by the ParserCacheSaveComplete hook. If these match up, we don't need to do any updates.

Tobi_WMDE_SW triaged this task as High priority.Jul 1 2015, 8:14 AM
daniel added a comment.Jul 1 2015, 9:42 AM

@hoo: we need to update usage tracking whenever a new target language is viewed after a page was touched. Relying on the modified date sadly doesn't work, because it's not set when the update is triggered via the job queue (learned that the hard way).

hoo added a comment.Jul 1 2015, 1:59 PM

@hoo: we need to update usage tracking whenever a new target language is viewed after a page was touched. Relying on the modified date sadly doesn't work, because it's not set when the update is triggered via the job queue (learned that the hard way).

Ok, that makes sense… in that case I guess we should try to compare the data from slave with what we want to write first. If it matches, almost certainly no changes are needed.

Would that defuse the problem enough?

hoo added a comment.Jul 1 2015, 2:03 PM

Also would my original idea work if the client setting allowDataAccessInUserLanguage is set to false? If so, that might be worth special casing (that would work for the ParserCacheSaveComplete hook which is more important here, at least).

Change 222463 had a related patch set uploaded (by Daniel Kinzler):
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/222463

daniel added a comment.Jul 2 2015, 8:04 PM

The patch above makes the update always go via the job queue. Not sure whether that is the Right Thing.

Change 222463 had a related patch set uploaded (by Aude):
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/222463

Change 222463 merged by Aude:
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/222463

Change 227632 had a related patch set uploaded (by Aude):
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/227632

Change 227634 had a related patch set uploaded (by Aude):
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/227634

Change 227632 merged by Aude:
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/227632

Change 227634 merged by jenkins-bot:
On page views, update usage tracking via the job queue.

https://gerrit.wikimedia.org/r/227634

hoo closed this task as Resolved.Aug 2 2015, 10:24 PM
hoo removed a project: Patch-For-Review.
ori reopened this task as Open.EditedMar 16 2016, 6:36 PM
ori added a subscriber: ori.

Doing this via the job queue is not a good solution, in my opinion. There is something fundamentally broken about tying updates to parser cache events.

Subscribing to changes via a ParserCacheSaveComplete hook handler is a bit like a department store updating its inventory any time a customer touches an item: sure, it covers all the relevant transactions, but it also creates a large amount of unnecessary work, because customers may handle an item for a number of reasons (to try it on or read the label), not all of which result in a transaction.

ParserCacheSaveComplete events are the same way: they cover all the cases in which an edit is made, but they also fire in cases which do not involve a transaction. For example, the mobile and desktop web sites use different key-spaces to avoid polluting each other's parser cache entries with platform-specific artifacts, so there are at least two ParserCacheSaveComplete events for each edit.

Hooking into ParserCacheSaveComplete is the wrong thing to do, because (AIUI) it isn't the really the parser cache that Wikibase cares about, and because it muddles the distinction between read-only and read/write requests. For example, RefreshLinks jobs may opportunistically refresh the parser cache. The fact that Wikibase enqueues another job on ParserCacheSaveComplete makes it harder to reason about the job queue. Finally, like the example above, it creates a large amount of unnecessary work: when I commented out the hook for a few minutes yesterday (to investigate T129517), I saw a marked drop in memcached traffic.

Doing this via the job queue is not a good solution, in my opinion. There is something fundamentally broken about tying updates to parser cache events.

Why? We are indeed trying to track which parser cache entry uses what information from wikidata. No more and no less.

Subscribing to changes via a ParserCacheSaveComplete hook handler is a bit like a department store updating its inventory any time a customer touches an item: sure, it covers all the relevant transactions, but it also creates a large amount of unnecessary work, because customers may handle an item for a number of reasons (to try it on or read the label), not all of which result in a transaction.

We only track if the parser actually accesses the data item. That is, when the thing in the parser cache actually depends on the data item.

ParserCacheSaveComplete events are the same way: they cover all the cases in which an edit is made, but they also fire in cases which do not involve a transaction. For example, the mobile and desktop web sites use different key-spaces to avoid polluting each other's parser cache entries with platform-specific artifacts, so there are at least two ParserCacheSaveComplete events for each edit.
Hooking into ParserCacheSaveComplete is the wrong thing to do, because (AIUI) it isn't the really the parser cache that Wikibase cares about, and because it muddles the distinction between read-only and read/write requests.

It's exactly the ParserCache that Wikibase cares about. We need to know which information from Wikidata has been used to construct HTML that is in the ParserCache, so we can purge the cache when the data changes. I don't see any other use case. Do you have an alternative suggestion that would achieve this? Once we have T105766: RFC: Dependency graph storage; sketch: adjacency list in DB we will no longer need this, but even then, we will need a place to store dependencies between a generated resource and whatever it was generated from.

When do you think does this generated unnecessary work? Our tracking is fairly fine grained, provided people use the parser function and Lua module in a sane way. We'll not purge a page that uses a label of Q123 when a sitelink on Q123 changes, etc.

One alternative I can think of is to store the tracking information in the parser cache itself, instead of the database. But there are two problems with that:

  • the tracking info must never expire before the actual cache entry
  • we need to be able to query usages by item. The parser cache is keyed per local page.

So, until we have T105766: RFC: Dependency graph storage; sketch: adjacency list in DB, I don't see an alternative.

hoo added a comment.Mar 21 2016, 8:08 PM

@ori Could you please answer Daniel's questions? This task is high priority, yet it's not actionable.

Slightly related T124737.

hoo closed this task as Resolved.Oct 19 2016, 3:18 PM
hoo claimed this task.

The problem this task was originally about has been addressed. If there are further issues, please file new tasks for them.