Page MenuHomePhabricator

Improve Rdbms/LoadBalance and its LoadMonitor logic
Open, HighPublic

Description

Follows investigation from {T180918}.

Rewrite LoadMonitor and related LoadBalancer logic

  • Use WANObjectCache::getWithSetCallback()
  • Increase the server state polling interval but add a mutex on shared cache updates via "lockTSE" as a throttle
  • Increase the moving average factor to make the weight scale values more responsive to problems
  • Account for DBError exceptions from getLag()
  • Add pingFailure() method to quickly react to failing servers rather than just waiting on the server state polls
  • Add backtraces to replication wait timeouts

Was initially reviewed by @Anomie, with feedback that has since been incorporated by @aaron. Requesting further review/merging from CPT.

Details

Related Gerrit Patches:

Event Timeline

Krinkle created this task.Aug 20 2019, 1:32 PM
Restricted Application added a project: Core Platform Team. · View Herald TranscriptAug 20 2019, 1:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 394430 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Rewrite LoadMonitor and related LoadBalancer logic

https://gerrit.wikimedia.org/r/394430

daniel triaged this task as Medium priority.Aug 22 2019, 7:55 AM
WDoranWMF raised the priority of this task from Medium to High.Sep 11 2019, 4:42 PM
WDoranWMF added a subscriber: WDoranWMF.

Raising the priority for this task as it's blocking for performance.