Page MenuHomePhabricator

Improve Rdbms/LoadBalance and its LoadMonitor logic
Closed, ResolvedPublic

Description

Follows investigation from {T180918}.

Rewrite LoadMonitor and related LoadBalancer logic

  • Use WANObjectCache::getWithSetCallback()
  • Increase the server state polling interval but add a mutex on shared cache updates via "lockTSE" as a throttle
  • Increase the moving average factor to make the weight scale values more responsive to problems
  • Account for DBError exceptions from getLag()
  • Add pingFailure() method to quickly react to failing servers rather than just waiting on the server state polls
  • Add backtraces to replication wait timeouts

Was initially reviewed by @Anomie, with feedback that has since been incorporated by @aaron. Requesting further review/merging from CPT.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 394430 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Rewrite LoadMonitor and related LoadBalancer logic

https://gerrit.wikimedia.org/r/394430

daniel triaged this task as Medium priority.Aug 22 2019, 7:55 AM
WDoranWMF raised the priority of this task from Medium to High.Sep 11 2019, 4:42 PM
WDoranWMF subscribed.

Raising the priority for this task as it's blocking for performance.

FYI, I already +1'ed the patch, but @jcrespo raised some questions on the PS that I think should be addressed before merging.

Change 577702 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: refactor the caching logic in LoadMontior

https://gerrit.wikimedia.org/r/577702

Change 577702 merged by jenkins-bot:
[mediawiki/core@master] rdbms: refactor the caching logic in LoadMonitor

https://gerrit.wikimedia.org/r/577702

Change 394430 abandoned by Aaron Schulz:
[mediawiki/core@master] Make LoadMonitorMySQL better detect servers with connection problems

Reason:
No use for WMF (goes with etcd external tools)

https://gerrit.wikimedia.org/r/394430

Krinkle assigned this task to aaron.