Page MenuHomePhabricator

High log volume from rdbms lib triggering logstash throttler
Closed, ResolvedPublic

Description

These logs have high enough rate that they trigger the logstash throttler. T314434

includes/libs/rdbms/ChronologyProtector.php

found position data under a presumed clientId (T314434)

We had to filter these logs at the appserver level to mitigate kafka lag towards codfw T330205#8699489

includes/libs/rdbms/platform/SQLPlatform.php

Wikimedia\Rdbms\Platform\SQLPlatform::isWriteQuery fallback to regex

from /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/platform/SQLPlatform.php(1760)
#0 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/Database.php(934): Wikimedia\Rdbms\Platform\SQLPlatform->isWriteQuery(string, integer)
#1 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/DatabaseMysqlBase.php(197): Wikimedia\Rdbms\Database->executeQuery(string, string, integer, string)
#2 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/Database.php(1779): Wikimedia\Rdbms\DatabaseMysqlBase->doSelectDomain(Wikimedia\Rdbms\DatabaseDomain)
#3 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1011): Wikimedia\Rdbms\Database->selectDomain(Wikimedia\Rdbms\DatabaseDomain)
#4 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/loadbalancer/LoadBalancer.php(869): Wikimedia\Rdbms\LoadBalancer->reuseOrOpenConnectionForNewRef(integer, Wikimedia\Rdbms\DatabaseDomain, integer)
#5 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/loadbalancer/LoadBalancer.php(861): Wikimedia\Rdbms\LoadBalancer->getServerConnection(integer, string, integer)
#6 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/DBConnRef.php(103): Wikimedia\Rdbms\LoadBalancer->getConnectionInternal(integer, array, string, integer)
#7 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/DBConnRef.php(117): Wikimedia\Rdbms\DBConnRef->ensureConnection()
#8 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/DBConnRef.php(722): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#9 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/rdbms/database/Database.php(2963): Wikimedia\Rdbms\DBConnRef->getSessionLagStatus()
#10 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/User/CentralAuthUser.php(619): Wikimedia\Rdbms\Database::getCacheSetOptions(Wikimedia\Rdbms\DBConnRef)
#11 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/objectcache/wancache/WANObjectCache.php(1759): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->MediaWiki\Extension\CentralAuth\User\{closure}(boolean, integer, array, NULL, array)
#12 /srv/mediawiki/php-1.40.0-wmf.27/includes/libs/objectcache/wancache/WANObjectCache.php(1589): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#13 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/User/CentralAuthUser.php(639): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#14 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/User/CentralAuthUser.php(448): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->loadFromCache()
#15 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/User/CentralAuthUser.php(691): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->loadState()
#16 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/User/CentralAuthUser.php(784): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->getId()
#17 /srv/mediawiki/php-1.40.0-wmf.27/extensions/CentralAuth/includes/CentralAuthPrimaryAuthenticationProvider.php(425): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->exists()
#18 /srv/mediawiki/php-1.40.0-wmf.27/includes/auth/AuthManager.php(2344): MediaWiki\Extension\CentralAuth\CentralAuthPrimaryAuthenticationProvider->testUserExists(string, integer)
#19 /srv/mediawiki/php-1.40.0-wmf.27/includes/auth/AuthManager.php(1060): MediaWiki\Auth\AuthManager->userExists(string, integer)
#20 /srv/mediawiki/php-1.40.0-wmf.27/includes/api/ApiQueryUsers.php(280): MediaWiki\Auth\AuthManager->canCreateAccount(string)
#21 /srv/mediawiki/php-1.40.0-wmf.27/includes/api/ApiQuery.php(679): ApiQueryUsers->execute()
#22 /srv/mediawiki/php-1.40.0-wmf.27/includes/api/ApiMain.php(1903): ApiQuery->execute()
#23 /srv/mediawiki/php-1.40.0-wmf.27/includes/api/ApiMain.php(880): ApiMain->executeAction()
#24 /srv/mediawiki/php-1.40.0-wmf.27/includes/api/ApiMain.php(851): ApiMain->executeActionWithErrorHandling()
#25 /srv/mediawiki/php-1.40.0-wmf.27/api.php(91): ApiMain->execute()
#26 /srv/mediawiki/php-1.40.0-wmf.27/api.php(46): wfApiMain()
#27 /srv/mediawiki/w/api.php(3): require(string)
#28 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
colewhite updated the task description. (Show Details)

Wikimedia\Rdbms\Platform\SQLPlatform::isWriteQuery fallback to regex is new this week and from https://gerrit.wikimedia.org/r/c/mediawiki/core/+/894110

Krinkle triaged this task as High priority.Mar 16 2023, 6:20 AM

Change 900521 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900521

Change 900521 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900521

Change 900667 had a related patch set uploaded (by Jforrester; author: Krinkle):

[mediawiki/core@wmf/1.40.0-wmf.27] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900667

Change 900668 had a related patch set uploaded (by Jforrester; author: Krinkle):

[mediawiki/core@REL1_40] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900668

Change 900668 merged by jenkins-bot:

[mediawiki/core@REL1_40] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900668

Change 900667 merged by jenkins-bot:

[mediawiki/core@wmf/1.40.0-wmf.27] rdbms: Add missing QUERY_CHANGE_ flag to internal "USE" query

https://gerrit.wikimedia.org/r/900667

@colewhite The above should fix 99% if not 100% of them. I'm leaving this open and assigned to you to remove the filter or confirm that it's not solved and then feel free to sign back over to me.

Based on https://grafana.wikimedia.org/d/VCK8-FpZz/cwhite-logstash?orgId=1&refresh=1m&viewPanel=46 at least, it seems like it's working :)

Looks like wmf.27 is rolled out. Lets try it.

Change 900716 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] profile: remove SQLPlatform::isWriteQuery high log volume mitigation

https://gerrit.wikimedia.org/r/900716

Change 900716 merged by Cwhite:

[operations/puppet@production] profile: remove SQLPlatform::isWriteQuery high log volume mitigation

https://gerrit.wikimedia.org/r/900716

Change 900718 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] logstash: sample high-volume rdbms lib logging

https://gerrit.wikimedia.org/r/900718

Change 900718 merged by Cwhite:

[operations/puppet@production] logstash: sample high-volume rdbms lib logging

https://gerrit.wikimedia.org/r/900718

The log rate is no longer overwhelming codfw. I've added filters to reduce ingest load on the cluster in a trackable way. You can view the events/sec rate here.