Page MenuHomePhabricator

Update wgLBFactoryConf for x2 to register only the local primary
Closed, ResolvedPublic

Description

Outcome

Adjust dbct such that the LBFactoryConf structure it writes to Etcd for wmf-config, registers only the local primary. E.g. instead of a list of three servers with [0] being the primary, let the list contain a single item only.

Context

I'd like to try keeping complexity down by not adding a new concept to MW for "a cluster of servers where we pretend only 1 exists and disable the multi-server features".

The overhead of ChronologyProtector, the concept of lagged-replica mode, the periodic broadcast connect to all replicas to check lags and proactively enact read-only mode when lag is too hgih etc; these are all naturally turned off for single-server DB clusters.

I suggest we configure x2 as a single-server cluster, similar to what we do for local development and CI by default ("LBSimple"), and somewhat akin to what we've done with ParserCache for a long time (somewhat different since those do contain multiple servers, but we treat each as its own isolated master-only host).

From #wikimedia-sre:

20:50 <Krinkle> weight:0 will not be enough I think. It will dampen the effect but still register them as generally having replicas.
20:50 <Krinkle> it's a boolean change in behaviour
20:51 <cdanis> Krinkle: okay, thanks, that's all good context. I need to check but I don't think it should be hard to add another dbctl section 'flavor' that simply doesn't include any replicas in the output
20:51 <cdanis> which sounds like it would be enough?
21:05 <Krinkle> This sounds like dbctl stores data in two places in etcd, one as source and one as output.
21:05 <Krinkle> If so, yeah, that sounds like it would suffice
21:06 <cdanis> yes
21:06 <cdanis> that is right :)
21:06 <cdanis> the output part very closely follows Mediawiki's data structures

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

Raising this to high.
@CDanis please confirm this wouldn't break anything.
What I have done is, set min_replicas: 0 on x2 codfw (for testing) and then depooled both replicas, so the diff would look like:

root@cumin1001:~# dbctl config diff
codfw/externalLoads/x2 live                                        codfw/externalLoads/x2 generated
[                                                                  [
    {                                                                  {
        "db2142": 0                                                        "db2142": 0
    },                                                                 },
    {                                                                  {}
        "db2143": 100,
        "db2144": 100
    }
]                                                                  ]
root@cumin1001:~#

I guess would have the intended behaviour and MW won't expect/check for replicas? cc @Krinkle

To confirm, I have not pushed the change until we are sure that won't break anything.

I did leave min_replicas: 0 though.

@Marostegui That looks correct to me.

I can write a dbctl patch today to do this automatically, so you can still manage replicas in dbctl, if that would be helpful.

We'd implement that by adding a new flavor for sections -- I guess call it mainstash? or omit-replicas? -- and then we could set x2 to be that flavor.

@Marostegui That looks correct to me.

I can write a dbctl patch today to do this automatically, so you can still manage replicas in dbctl, if that would be helpful.

That'd be helpful indeed! Thank you

We'd implement that by adding a new flavor for sections -- I guess call it mainstash? or omit-replicas? -- and then we could set x2 to be that flavor.

omit-replicas looks good to me, so we can re-use it somewhere else if needed (ideally I would like to handle parsercache with dbctl in a future, as they are the only ones still handled with mediawiki-config)

Also, would this solution also work for T245239: dbctl: treat read only ES hosts as standalone hosts?

Thanks!

omit-replicas looks good to me, so we can re-use it somewhere else if needed (ideally I would like to handle parsercache with dbctl in a future, as they are the only ones still handled with mediawiki-config)

OK, sounds good!

Also, would this solution also work for T245239: dbctl: treat read only ES hosts as standalone hosts?

Unfortunately not, although I'll do a little thinking about that one while I'm in the code.

Change 828606 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/software/conftool@master] dbctl: Add omit_replicas_in_mwconfig section attribute

https://gerrit.wikimedia.org/r/828606

@Marostegui I actually implemented this not as a new flavor, but instead as a boolean attribute omit_replicas_in_mwconfig on the section object. Once the patch is merged and deployed I'll let you know.

Excellent @CDanis - thank you. Let me know indeed and I can add it to x2.

Change 828606 merged by jenkins-bot:

[operations/software/conftool@master] dbctl: Add omit_replicas_in_mwconfig section attribute

https://gerrit.wikimedia.org/r/828606

Change 828946 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] dbctl: update schema for 2.2.2

https://gerrit.wikimedia.org/r/828946

Change 828946 merged by CDanis:

[operations/puppet@production] dbctl: update schema for 2.2.2

https://gerrit.wikimedia.org/r/828946

Change 828947 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] dbctl: update schema for 2.2.2

https://gerrit.wikimedia.org/r/828947

Change 828947 merged by CDanis:

[operations/puppet@production] dbctl: update schema for 2.2.2

https://gerrit.wikimedia.org/r/828947

This is ready, was tested by hand on cumin2002, and is now deployed to both cumin hosts.

✔️ cdanis@cumin2002.codfw.wmnet ~ 🕗☕ sudo dbctl -s codfw section x2 edit
# edit omit_replicas_in_mwconfig to be true instead of default false

✔️ cdanis@cumin2002.codfw.wmnet ~ 🕗☕ dbctl config diff -u    
--- codfw/externalLoads/x2 live
+++ codfw/externalLoads/x2 generated
@@ -2,8 +2,5 @@
     {
         "db2142": 0
     },
-    {
-        "db2143": 100,
-        "db2144": 100
-    }
+    {}
 ]

✔️ cdanis@cumin2002.codfw.wmnet ~ 🕗☕ sudo dbctl -s codfw section x2 edit
# rolled back omit_replicas_in_mwconfig to be false

Mentioned in SAL (#wikimedia-operations) [2022-09-01T12:20:27Z] <cdanis@cumin2002> dbctl commit (dc=all): 'T316482 remove replicas from x2', diff saved to https://phabricator.wikimedia.org/P33736 and previous config saved to /var/cache/conftool/dbconfig/20220901-122026-cdanis.json