Page MenuHomePhabricator

Disallow 'weight: 0' for MW db config in dbctl
Open, MediumPublic

Description

This is a follow-up actionable for the T239874 incident.

Objective

Avoid future cases of unintentionally leaving a former master pooled as zero-weight replica.

What we know
  • Masters usually have weight: 0.
  • There is usually no replica with weight: 0.
  • We sometimes use weight: 1 for replicas that should receive little to no traffic but should still be waited for in terms of avoiding replication lag (and to use as backup in case of issues).
  • We know that weight: 0 does currently result in at least some attempted connections from MW (per T239874, MW tried at least 4000 times per hour every hour throughout a 24 hour period). It is the topic of T239900 to discuss whether this is desirable long-term.
Proposal

Enforce with some validation logic or schema in dbctl that a replica cannot have zero weight. This is meant to avoid the footgun scenario where a new master is prepended via dbctl but then to forget changing the configuration for the former master.

Making a zero-weight replica illegal means the operator will either have to depool it properly or to set its weight to at least 1.

I don't have a strong preference for this proposal. I don't know this area very well, so if I got something wrong or if there's a different/better way we can/should do this instead, please suggest it :)

Event Timeline

Krinkle created this task.Dec 5 2019, 11:44 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 5 2019, 11:44 AM
Krinkle updated the task description. (Show Details)Dec 5 2019, 11:47 AM
Krinkle updated the task description. (Show Details)Dec 5 2019, 11:49 AM
jijiki added a subscriber: jijiki.Dec 5 2019, 12:14 PM

I am not sure if I want to fully disallow weight 0 for replicas, there are some cases where we might actually want that,
Cross posting from: T239900

We used to have vslow,dump with weight 0 just in case the dumps could create lag with the heavy queries.
I do see a reason to keep replicas with weight 0 for that specific reason, dumps is a good example, we have hosts that we still have pooled for an specific kind of traffic that might be ok with lag or some lag at least.

What is yet not clear to me is what happens if you have weight:0 for a host on the main traffic section, but then it is pooled somewhere else (ie: api, vslow..)

colewhite triaged this task as Medium priority.Dec 5 2019, 5:55 PM
Paladox added a subscriber: Paladox.Dec 5 2019, 5:59 PM
Krinkle moved this task from On-going to Follow-up on the Wikimedia-Incident board.Dec 9 2019, 9:34 PM
Marostegui moved this task from Triage to In progress on the DBA board.Dec 10 2019, 6:43 AM
jcrespo moved this task from Backlog to Acknowledged on the Operations board.Dec 11 2019, 5:53 PM

@Marostegui Yeah, that's different if it has non-zero weight in one group and zero in the main group. That makes it clear that it is pooled and used for live traffic. Labels are informative and fallback can be needed at times. That is, there is no scenario in which a db issue is less important due to its label, given that any pooled replica may be used by any web request for any reason.

Is there a use case for having a replica only listed in "general" with weight 0? (As opposed to the lowest weight of 1, or actually depooling).

Is there a use case for having a replica only listed in "general" with weight 0? (As opposed to the lowest weight of 1, or actually depooling).

Not really, the only case I could see would be a replica with 0 on the "general" and some weight on the groups section could be mostly for vslow