Page MenuHomePhabricator

Incredible amount of logs from Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks
Open, In Progress, HighPublic

Description

During connectivity issues in production, these logs are generated at incredible volumes.

June 22 2024 22:00 UTC to June 23 2024 05:30 UTC for message:"found writes pending"

CountNormalized message
18,375,432Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks: found writes pending ().
11,027,272Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks: found writes pending (MediaWiki\Deferred\LinksUpdate\LinksUpdate::acquirePageLock).
5,145,993Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks: found writes pending (CategoryMembershipChangeJob::run).
2,841,780Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks: found writes pending (MediaWiki\Linter\Database::setForPage).
2,167,911Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks: found writes pending (MediaWiki\Deferred\LinksUpdate\LinksUpdate::updateLinksTimestamp, MediaWiki\Deferred\LinksUpdate\LinksUpdate::acquirePageLock, MediaWiki\Deferred\LinksUpdate\LinksUpdate
......

Event Timeline

Change #1049228 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049228

aaron changed the task status from Open to In Progress.Mon, Jun 24, 5:54 PM
aaron moved this task from Incoming (Needs Triage) to In Progress on the MW-Interfaces-Team board.

Change #1049261 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.43.0-wmf.10] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049261

Change #1049261 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.10] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049261

Change #1049228 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049228

Change #1049909 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.43.0-wmf.11] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049909

Change #1049909 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.11] rdbms: Reduce log severity of "found writes pending"

https://gerrit.wikimedia.org/r/1049909

I've been trying to debug and demystify what caused this. Here are some preliminary notes:

FTR, after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1048863 we dropped 99% of "found writes pending" logs randomly as they appeared from Kafka. I have deleted no ingested logs.

According to the Kafka topic, we got around 560,778 logs per second (src). This doesn't specifically select just "found writes pending" logs, but the udp_localhost-warning topic doesn't get a lot of traffic.

Given that 99% of the logs were dropped, it mean each job triggered 1000 "writes pending found" log. The only explanation is that it fell into an infinite loop when databases got lagged or slow. That part is not fixed yet and should be investigated further.