Page MenuHomePhabricator

Make LoadBalancer::getReaderIndex handle down groups better
Closed, ResolvedPublic

Description

Suppose that:

  • getConnection() is called with with $groups as [ "a" ]
  • The server(s) in group "a" are down
  • The "generic" group servers are up

Each call to getConnection() will trigger a call to getReaderIndex() on group "a", which will most likely fail and the DBConnRef will use a "generic" server; in rare cases, it might succeed and use a group "b" server (with different REPEATABLE-READ view snapshot than the group "a" server connection). This is inefficient and might add log spam.

Event Timeline

A simple option might be to just only use the first group that is defined with server loads, with no fallback. We probably don't want vslow dump queries falling back to the main DB servers anyway. Multiple servers can already be configured for group if redundancy is desired.

Change 889650 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: simplify query group selection in LoadBalancer::getConnection()

https://gerrit.wikimedia.org/r/889650

Change 889650 merged by jenkins-bot:

[mediawiki/core@master] rdbms: simplify query group selection in LoadBalancer::getConnection()

https://gerrit.wikimedia.org/r/889650

aaron claimed this task.