From dbperformance.log (aggregating by offender) for a ~day:
20721 TranslatablePage.php line 640 calls wfGetDB()
25520 TranslatablePage.php line 677 calls wfGetDB()
From dbperformance.log (aggregating by offender) for a ~day:
20721 TranslatablePage.php line 640 calls wfGetDB()
25520 TranslatablePage.php line 677 calls wfGetDB()
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aaron | T88445 MediaWiki active/active datacenter investigation and work (tracking) | |||
Open | None | T92357 Fix database master queries from HTTP GET/HEAD before active-active multi-dc | |||
Resolved | aaron | T95753 Translate extensions uses the master a lot on page views | |||
Duplicate | None | T70530 Many INSERT queries on some web requests |
These are intentional, according to code comments: bd885e6f7044aed223caaa521a866e6a23037308. In particular, it would be dangerous to load stale data when marking pages for translation (T39647).
If you have more information on why using master is incorrect and in what cases, please reopen.
Note that that log is only for GET request, so if all that happened on POST none of it would be logged. How hard is that is to change, I don't know, but the general pattern to follow is have update heavy requests use POST. In a multi-DC setup, the performance of doing updates on GET will regress for people using the non-primary DC.
Look at https://phabricator.wikimedia.org/T39647, it seems like a problem with the code was not distinguishing read and read-for-write code paths. If you don't do that, then you are stuck using DB_MASTER, but if the logic is tunable (like various things that reference IDBAccessObject and more) then it works. It's a bit of a chore doing it retro-actively and getting all the $flags write, but not impossible.
Look at https://phabricator.wikimedia.org/T39647, it seems like a problem with the code was not distinguishing read and read-for-write code paths.
How so? There's no mention of a single "read-not-for-write" use case there.
I meant "looking" at the task as well as the code. The task mentions the index updates not seeing prior and related translate_sections updates due to using DB_SLAVE. I also recall talking to nike about the class and DB_MASTER usage on IRC and he mentioned some unease about knowing where to use DB_MASTER/DB_SLAVE. I think the simplest rule is the POST/GET distinction. If an update is *really* needed on GET, it can move to a job and use lazyPush().
If there is a case where that is not good enough (the update has to happen on GET *right now* for some reason, I'd want to see the explanation for that). The time people spend looking at pages before taking an action can easily surpass slag lag delay, so that kind of conflict detection/resolution has to exist anyway. If the problem is people constantly not seeing the results of other people work then maybe jobs can given higher priority or be optimized and/or "update in progress" type notices can be shown.
The code is fairly complex, so it would be good to have some high level view on what the page view triggered reads and updates need to do and what needs to be seen immediately, ect.
[GET] Expectation (masterConns <= 0) by MediaWiki::main not met:
[connect to 10.64.16.22 (metawiki)]
TransactionProfiler.php line 307 calls wfBacktrace()
TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated()
LoadBalancer.php line 545 calls TransactionProfiler->recordConnection()
GlobalFunctions.php line 3617 calls LoadBalancer->getConnection()
TranslatablePage.php line 639 calls wfGetDB()
TranslatablePage.php line 755 calls TranslatablePage->getTranslationPages()
PageTranslationHooks.php line 753 calls TranslatablePage->getTranslationPercentages()
PageTranslationHooks.php line 664 calls PageTranslationHooks::translationPageHeader()
Hooks.php line 204 calls PageTranslationHooks::translatablePageHeader()
Article.php line 595 calls Hooks::run()
ViewAction.php line 44 calls Article->view()
MediaWiki.php line 456 calls ViewAction->show()
MediaWiki.php line 255 calls MediaWiki->performAction()
MediaWiki.php line 683 calls MediaWiki->performRequest()
MediaWiki.php line 474 calls MediaWiki->main()
index.php line 41 calls MediaWiki->run()
index.php line 3 calls include()
I'm hacking up a simple hacky fix by using the master based on GET/POST distinction, assuming there are no read-write cycles on GET requests...
Change 234173 had a related patch set uploaded (by Aaron Schulz):
Use DB_SLAVE in TranslatablePage for GET request
assuming there are no read-write cycles on GET requests
This assumption is incorrect, see e.g. requests of the kind https://meta.wikimedia.org/w/index.php?title=Special:PageTranslation&do=mark&target=$page&revision=13336697&token=$token .
Is there a reason that such state changing requests do not use POST?
In the patch on mention a GET read-write trace that looked suspect, so indeed some code bits may need to be changed to use POST before merging that patch.
The queries affecting normal page view and category view at least should be prioritized.
Expectation (masterConns <= 0) by MediaWiki::main not met: [connect to 10.64.16.22 (metawiki)] TransactionProfiler.php line 311 calls wfBacktrace() TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated() LoadBalancer.php line 573 calls TransactionProfiler->recordConnection() GlobalFunctions.php line 3562 calls LoadBalancer->getConnection() TranslatablePage.php line 639 calls wfGetDB() TranslatablePage.php line 755 calls TranslatablePage->getTranslationPages() PageTranslationHooks.php line 753 calls TranslatablePage->getTranslationPercentages() PageTranslationHooks.php line 664 calls PageTranslationHooks::translationPageHeader() Hooks.php line 204 calls PageTranslationHooks::translatablePageHeader() Article.php line 595 calls Hooks::run() CategoryPage.php line 73 calls Article->view() ViewAction.php line 44 calls CategoryPage->view() MediaWiki.php line 489 calls ViewAction->show() MediaWiki.php line 286 calls MediaWiki->performAction() MediaWiki.php line 704 calls MediaWiki->performRequest() MediaWiki.php line 506 calls MediaWiki->main() index.php line 41 calls MediaWiki->run() index.php line 3 calls include()
If I understand correctly, there are several issues here; can this task be split to several sub-tasks, with the appropriate priority to each? This will really help schedule the fixing work.
Change 234173 merged by jenkins-bot:
Use DB_SLAVE in TranslatablePage for GET request
@aaron How bad is the situation now? I still see quite a many items in logstash, but did this patch bring it down any amount?
Seems like there is still almost as many, just different paths tripping the warning now. I see a lot of:
WikiPageMessageGroup.php line 115 calls Revision::newFromTitle() TranslatablePage.php line 259 calls WikiPageMessageGroup->getMessage() PageTranslationHooks.php line 51 calls TranslatablePage->getPageDisplayTitle()
Change 277897 had a related patch set uploaded (by Aaron Schulz):
Reduce master queries from selectRowsIdLang()
Change 278461 had a related patch set uploaded (by Nikerabbit):
Use localstorage for recent groups on Special:Translate
Change 278593 had a related patch set uploaded (by Aaron Schulz):
Avoid more master queries on page views
Change 278461 merged by jenkins-bot:
Use localstorage for recent groups on Special:Translate
Change 284493 had a related patch set uploaded (by Ori.livneh):
Avoid more master queries on page views
Change 286025 had a related patch set uploaded (by Aaron Schulz):
Fix remaning master queries on page views
Still seeing:
Expectation (masterConns <= 0) by MediaWiki::main not met: [connect to 10.64.16.144 (wikidatawiki)] TransactionProfiler.php line 311 calls wfBacktrace() TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated() LoadBalancer.php line 571 calls TransactionProfiler->recordConnection() GlobalFunctions.php line 3123 calls LoadBalancer->getConnection() WikiPageMessageGroup.php line 61 calls wfGetDB() WikiPageMessageGroup.php line 100 calls WikiPageMessageGroup->getDefinitions() WikiPageMessageGroup.php line 116 calls WikiPageMessageGroup->load() TranslatablePage.php line 259 calls WikiPageMessageGroup->getMessage() PageTranslationHooks.php line 248 calls TranslatablePage->getPageDisplayTitle() Parser.php line 4251 calls PageTranslationHooks::languages() Preprocessor_Hash.php line 1148 calls Parser->extensionSubstitution() Parser.php line 3355 calls PPFrame_Hash->expand() Parser.php line 1248 calls Parser->replaceVariables() Parser.php line 446 calls Parser->internalParse() WikitextContent.php line 331 calls Parser->parse() AbstractContent.php line 497 calls WikitextContent->fillParserOutput() PoolWorkArticleView.php line 140 calls AbstractContent->getParserOutput() PoolCounterWork.php line 123 calls PoolWorkArticleView->doWork() Article.php line 666 calls PoolCounterWork->execute() ViewAction.php line 44 calls Article->view() MediaWiki.php line 503 calls ViewAction->show() MediaWiki.php line 288 calls MediaWiki->performAction() MediaWiki.php line 745 calls MediaWiki->performRequest() MediaWiki.php line 519 calls MediaWiki->main() index.php line 43 calls MediaWiki->run() index.php line 3 calls include()
I am crawling through the DBPerformance in Logstash and found some things to mention:
Expectation (masterConns <= 0) by MediaWiki::main not met: [connect to 10.64.0.205 (mediawikiwiki)] TransactionProfiler.php line 311 calls wfBacktrace() TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated() LoadBalancer.php line 571 calls TransactionProfiler->recordConnection() GlobalFunctions.php line 3122 calls LoadBalancer->getConnection() MessageCollection.php line 697 calls wfGetDB() MessageCollection.php line 242 calls MessageCollection->loadData() PageTranslationHooks.php line 878 calls MessageCollection->loadTranslations() Hooks.php line 195 calls PageTranslationHooks::sourceExport() SpecialTranslate.php line 141 calls Hooks::run() SpecialPage.php line 417 calls SpecialTranslate->execute() SpecialPageFactory.php line 572 calls SpecialPage->run() MediaWiki.php line 282 calls SpecialPageFactory::executePath() MediaWiki.php line 745 calls MediaWiki->performRequest() MediaWiki.php line 519 calls MediaWiki->main() index.php line 43 calls MediaWiki->run() index.php line 3 calls include()
There is a patch that I think fixes this: https://gerrit.wikimedia.org/r/#/c/283491/2/MessageCollection.php - activity seems to be less than 200 entries per day, with some days not having it at all.
Expectation (masterConns <= 0) by MediaWiki::main not met: [connect to 10.64.0.205 (mediawikiwiki)] TransactionProfiler.php line 311 calls wfBacktrace() TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated() LoadBalancer.php line 571 calls TransactionProfiler->recordConnection() GlobalFunctions.php line 3122 calls LoadBalancer->getConnection() MessageGroups.php line 757 calls wfGetDB() MessageGroups.php line 229 calls MessageGroups::loadAggregateGroups() Hooks.php line 195 calls MessageGroups::getAggregateGroups() MessageGroups.php line 141 calls Hooks::run() MessageGroups.php line 46 calls MessageGroups->loadGroupDefinitions() MessageGroups.php line 514 calls MessageGroups->init() MessageGroups.php line 248 calls MessageGroups->getGroups() TranslatablePage.php line 213 calls MessageGroups::getGroup() TranslatablePage.php line 253 calls TranslatablePage->getMessageGroup() PageTranslationHooks.php line 51 calls TranslatablePage->getPageDisplayTitle() Hooks.php line 195 calls PageTranslationHooks::renderTagPage() Parser.php line 653 calls Hooks::run() Parser.php line 5101 calls Parser->preprocess() MessageCache.php line 1036 calls Parser->transformMsg() Message.php line 1154 calls MessageCache->transform() Message.php line 821 calls Message->transformText() Message.php line 884 calls Message->toString() OutputPage.php line 927 calls Message->text() OutputPage.php line 975 calls OutputPage->setHTMLTitle() VisualEditor.hooks.php line 171 calls OutputPage->setPageTitle() Hooks.php line 195 calls VisualEditorHooks::onCustomEditor() EditAction.php line 57 calls Hooks::run() MediaWiki.php line 503 calls EditAction->show() MediaWiki.php line 288 calls MediaWiki->performAction() MediaWiki.php line 745 calls MediaWiki->performRequest() MediaWiki.php line 519 calls MediaWiki->main() index.php line 43 calls MediaWiki->run() index.php line 3 calls include()
This one is having less than 30 entries per day. I am wondering why it happens at all though. Could be a sign of evictions from a cache.
Also T70530 seems to be causing >500 entries per day, which seems to indicate that it is still high priority.