Page MenuHomePhabricator

Productionize x2 databases
Closed, ResolvedPublic

Description

The following hosts will be part of x2

eqiad:

  • db1151
  • db1152
  • db1153

codfw:

  • db2142
  • db2143
  • db2144
  • New section on tendril
  • New section on zarcillo
  • New section on dbctl
    • Populate it with the correct values
  • Check hosts are in tendril
  • Check hosts are in zarcillo
  • Check hosts are in dbctl instances.yaml
    • Populate them with the correct values
  • Check hosts have notifications enabled

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 649771 merged by Marostegui:
[operations/puppet@production] mariadb: Add eqiad x2 hosts

https://gerrit.wikimedia.org/r/649771

Mentioned in SAL (#wikimedia-operations) [2020-12-16T07:20:53Z] <marostegui> Stop mysql on db2142 to clone db1151 - T269324

Change 649818 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Add x2-master CNAME

https://gerrit.wikimedia.org/r/649818

Change 649818 merged by Marostegui:
[operations/dns@master] wmnet: Add x2-master CNAME

https://gerrit.wikimedia.org/r/649818

Change 649820 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Setup x2 production backups

https://gerrit.wikimedia.org/r/649820

Change 649852 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Do not reimage db214[234]

https://gerrit.wikimedia.org/r/649852

Change 649852 merged by Marostegui:
[operations/puppet@production] install_server: Do not reimage db214[234]

https://gerrit.wikimedia.org/r/649852

Change 649890 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] conftool/dbctl: add x2 section & hosts for it

https://gerrit.wikimedia.org/r/649890

Change 649890 merged by CDanis:
[operations/puppet@production] conftool/dbctl: add x2 section & hosts for it

https://gerrit.wikimedia.org/r/649890

@CDanis I was taking a look at the section values and I have seen: flavour which is sort of new to me and seems to accept regular or external. But I haven't been able to find what are each for on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/conftool/+/refs/heads/master/conftool/extensions/dbconfig/README.md

sX seem to be regular and x1, esX seem to be external, what's the difference between them?

Mentioned in SAL (#wikimedia-operations) [2020-12-21T08:15:05Z] <marostegui> Add ips to the x2 instances on dbctl T269324

@CDanis I was taking a look at the section values and I have seen: flavour which is sort of new to me and seems to accept regular or external. But I haven't been able to find what are each for on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/conftool/+/refs/heads/master/conftool/extensions/dbconfig/README.md

sX seem to be regular and x1, esX seem to be external, what's the difference between them?

Ah sorry! This shouldn't be undocumented, will fix.

The summary is that, at MW config generation time, 'regular' sections are output into sectionLoads and 'external' is output into externalLoads. So for x2 we'll want flavor 'external'.

Change 651269 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/software/conftool@master] dbctl: README: document section 'flavor'

https://gerrit.wikimedia.org/r/651269

Excellent - thank you Chris :-)

Change 651269 merged by jenkins-bot:
[operations/software/conftool@master] dbctl: README: document section 'flavor'

https://gerrit.wikimedia.org/r/651269

These hosts are ready.
Pending steps should be quick to do, and will be done after the holidays. To avoid introducing more variables and potential noise creators during the break:

  • Enable notifications
  • Populate dbctl instances with weights etc
  • Populate dbctl x2 section with the master etc

Change 654045 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbctl: Add x2 as a valid section

https://gerrit.wikimedia.org/r/654045

@CDanis I have been trying to add db2144 as the first slave on x2, but I am getting errors with the validation of x2 as an accepted value:

The modified object fails validation: 'x2' does not match any of the regexes: '^(s[1-8]|s1[01]|es[12345]|x1)$'
On instance['sections']:
    {'x2': {'percentage': 100, 'pooled': True, 'weight': 100}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/conftool/extensions/dbconfig/entities.py", line 35, in _validate_edit
    self.entity.validate(self.edited)
  File "/usr/lib/python3/dist-packages/conftool/kvobject.py", line 262, in validate
    rule.validate(current_values)
  File "/usr/lib/python3/dist-packages/conftool/types.py", line 118, in validate
    raise ValueError(exc.message)
ValueError: 'x2' does not match any of the regexes: '^(s[1-8]|s1[01]|es[12345]|x1)$'

Continue editing? [y/n] n
Execution FAILED
Reported errors:
'x2' does not match any of the regexes: '^(s[1-8]|s1[01]|es[12345]|x1)$'

I have put this up for review https://gerrit.wikimedia.org/r/c/operations/puppet/+/654045 but I am not sure if:

  1. this is correct
  2. how to deploy it

Let me know what you think and feel free to amend that patch if needed!
Thank you

Change 654045 merged by Marostegui:
[operations/puppet@production] dbctl: Add x2 as a valid section

https://gerrit.wikimedia.org/r/654045

@CDanis I have merged the above patch and followed your advice and ran puppet on cumin1001 hosts.
I went ahead and tried to edit db2142 to add it to x2 leaving it like this:

# Editing object codfw/db2142
host_ip: 10.192.0.14
note: ''
port: 3306
sections:
  x2: {percentage: 100, pooled: true, weight: 100}

However, when trying to save I get on an loop:

root@cumin1001:~# dbctl instance db2142 edit
Continue editing? [y/n] y
Continue editing? [y/n] n
Execution FAILED
Reported errors:
exceptions must derive from BaseException
root@cumin1001:~# dbctl -s codfw instance db2142 get
{
    "db2142": {
        "host_ip": "10.192.0.14",
        "note": "",
        "port": 3306,
        "sections": {}
    },
    "tags": "datacenter=codfw"
}

I essentially want to leave that host, like db2115 which is in x1:

# Editing object codfw/db2115
host_ip: 10.192.32.134
note: ''
port: 3306
sections:
  x1: {percentage: 100, pooled: true, weight: 100}

If you'd have time to check what I might be doing wrong, that'd be helpful
Thank you!

Sorry for the truly baffling error message.

The problem turned out to be that section x2 on codfw did not have flavor=external set (it was flavor=regular instead).

Now this works:

✔️ cdanis@cumin2001.codfw.wmnet ~ 🕐☕ dbctl config diff -u    
--- codfw/externalLoads/x2 live
+++ codfw/externalLoads/x2 generated
@@ -1 +1,6 @@
-{}
+[
+    {
+        "db2142": 100
+    },
+    {}
+]
--- codfw/hostsByName live
+++ codfw/hostsByName generated
@@ -62,6 +62,7 @@
     "db2138:3312": "10.192.32.106:3312",
     "db2138:3314": "10.192.32.106:3314",
     "db2140": "10.192.48.26",
+    "db2142": "10.192.0.14",
     "es2020": "10.192.0.157",
     "es2021": "10.192.16.148",
     "es2022": "10.192.32.188",

I'm still not sure where an exception was being thrown because of this, or why the error output is so bad. I'll take a look. But you should be unblocked now. Sorry this took so long :(

Mentioned in SAL (#wikimedia-operations) [2021-01-22T06:00:08Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db2142 into x2 as codfw master T269324', diff saved to https://phabricator.wikimedia.org/P13884 and previous config saved to /var/cache/conftool/dbconfig/20210122-060007-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-01-22T06:01:48Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db2143 and db2144 as x2 codfw slaves T269324', diff saved to https://phabricator.wikimedia.org/P13885 and previous config saved to /var/cache/conftool/dbconfig/20210122-060147-marostegui.json

Thank you Chris! I have pooled codfw hosts with no issues. On Monday I will do the same with eqiad ones as I rather not pool eqiad new hosts on Friday - just in case even if they are not in use, so it can wait till Monday.

Change 649820 abandoned by Jcrespo:
[operations/puppet@production] mariadb-backups: Setup x2 production backups

Reason:
not needed

https://gerrit.wikimedia.org/r/649820

Mentioned in SAL (#wikimedia-operations) [2021-01-25T06:43:06Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Populate x2 eqiad hosts into dbctl T269324', diff saved to https://phabricator.wikimedia.org/P13938 and previous config saved to /var/cache/conftool/dbconfig/20210125-064305-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-01-25T06:44:19Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add x2 eqiad to dbctl T269324', diff saved to https://phabricator.wikimedia.org/P13939 and previous config saved to /var/cache/conftool/dbconfig/20210125-064419-marostegui.json

Change 658081 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] x2 hosts: Enable notifications

https://gerrit.wikimedia.org/r/658081

Change 658081 merged by Marostegui:
[operations/puppet@production] x2 hosts: Enable notifications

https://gerrit.wikimedia.org/r/658081

Marostegui updated the task description. (Show Details)

This is all done - hosts are ready to start getting data.

Change 658218 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] etcd.php: Add x2 mapping

https://gerrit.wikimedia.org/r/658218

Change 658218 merged by jenkins-bot:
[operations/mediawiki-config@master] etcd.php: Add x2 mapping

https://gerrit.wikimedia.org/r/658218

Mentioned in SAL (#wikimedia-operations) [2021-01-25T09:15:14Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Add x2 to the mapping array T269324 (duration: 01m 01s)

Mentioned in SAL (#wikimedia-operations) [2021-01-25T09:21:27Z] <marostegui@deploy1001> Synchronized wmf-config/etcd.php: Add x2 to the mapping array T269324 (duration: 00m 58s)

This was also needed: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/658218 (added this to the dbctl documentation so we don't forget about it when adding new external clusters): https://wikitech.wikimedia.org/wiki/Dbctl#Add_new_externalload_section

This is all done - hosts are ready to start getting data.

I was thinking that these would be setup just like the pcxxxx servers (e.g. each server in eqiad having circular replication with a corresponding server in codfw). Doing so will allow for things like https://phabricator.wikimedia.org/T113916 to proceed, since some uses cases involve writes that can happen in either datacenter. The MediaWiki mainstash config will be similar to the parser cache config as well.

This is all done - hosts are ready to start getting data.

I was thinking that these would be setup just like the pcxxxx servers (e.g. each server in eqiad having circular replication with a corresponding server in codfw). Doing so will allow for things like https://phabricator.wikimedia.org/T113916 to proceed, since some uses cases involve writes that can happen in either datacenter. The MediaWiki mainstash config will be similar to the parser cache config as well.

That wasn't my understanding when this was discussed at T212129. Also I didn't get that impression when we briefly spoke on IRC a few days ago, I thought we just wanted another x1 with eqiad <-> codfw replication between masters, not 3 more independent replication chains.
Handling parsercache, from an operational point of view, at the moment is really painful (ie: having to commit to MW for a depooling, having to always have 3 lines and just duplicate IPs on the array, having no spares...) and I would like to expand this by 3 more replication chains. It puts certainly lots of overhead when having to operate with them.

There's probably also some puppet work that would need to be done as we simply don't want to just create 3 new pc4, pc5 and pc6 as that would be confusing - I would prefer if @Kormat can estimate how much this could be.

This is all done - hosts are ready to start getting data.

I was thinking that these would be setup just like the pcxxxx servers (e.g. each server in eqiad having circular replication with a corresponding server in codfw). Doing so will allow for things like https://phabricator.wikimedia.org/T113916 to proceed, since some uses cases involve writes that can happen in either datacenter. The MediaWiki mainstash config will be similar to the parser cache config as well.

That wasn't my understanding when this was discussed at T212129. Also I didn't get that impression when we briefly spoke on IRC a few days ago, I thought we just wanted another x1 with eqiad <-> codfw replication between masters, not 3 more independent replication chains.
Handling parsercache, from an operational point of view, at the moment is really painful (ie: having to commit to MW for a depooling, having to always have 3 lines and just duplicate IPs on the array, having no spares...) and I would like to expand this by 3 more replication chains. It puts certainly lots of overhead when having to operate with them.

There's probably also some puppet work that would need to be done as we simply don't want to just create 3 new pc4, pc5 and pc6 as that would be confusing - I would prefer if @Kormat can estimate how much this could be.

Making both the current master active is the minimum config change for it to be usable for the main stash. After talking to Timo yesterday, we leaned toward the pcXXXX style setup since the replica DBs in each DC would be hard to make use of for read traffic.

Either setup would work. If there is a single replication ring (only one master in each DC), then replicas will only be there is standby servers though (scaling would have to be done via vertical sharding by pointing different features to different stashes). If the operational overhead of parsercache style setups is too high, then I'm OK with the setup I mentioned on IRC. The important thing is allowing writes in both DCs.

Having another set of parsercache-like hosts is definitely not something I would like to pursue, especially the way they're currently handled via db-eqiad.php. I would rather something more like x1 (again, the way we handle it - via dbctl).

Having both masters being writable is already a snowflake in our environment and I need to double check with @Kormat about how to handle that with Puppet (especially the read_only setting and monitoring part - as right now we do not consider active anything in codfw).
Also I would need to double check with @CDanis how dbctl would handle or if it doesn't really care as long as we have the section marked with readonly: false.

What is the impact if one of the masters dies? What if writes cannot happen on...let's say eqiad?

Krinkle subscribed.

Writing locally to mainstash is a hard requirement.

Data is expected to generally persist and be eventually consistent but loss is tolerable. E.g. under maintenance a db can simply be truncated, and in case of conflicts it's fine for one of them to end up overwritten by an earlier write so long as it's eventually consistent.

Main stash does not need, and cannot support, replication within the DC. It will not make use of replicas or otherwise tolerate stale data. And in terms of traffic I believe we already confirmed that it is fine to handle all reads with a single master. The other two servers in each DC could be used to shard the data, where MW either distributes/hashes keys equally, or even simpler with MW distributing keys by cache namespace (e.g. cross-wiki keys on srv1, wikis ABC on srv2, other wikis on srv3).

Local writes are expected to be immediately readable. As such, there would not be much value in having replicas within the same DC other than as a hot failover. But with the higher level caching we have in-place it's totally fine to just failover to an empty DB instead, so long as we do it consistently between DCs (e.g. if one of them goes down or is taken down, we take out the other as well and point it to an empty one as well).

Writing locally to mainstash is a hard requirement.

That's ok - but we need to make changes to our puppet and verify how this would work with dbctl (I would expect it to be the same as x1, and treated as externalLoad?). This is the first time we are going to have such environment and we need to understand things as this will highly impact our workflows.

Data is expected to generally persist and be eventually consistent but loss is tolerable. E.g. under maintenance a db can simply be truncated, and in case of conflicts it's fine for one of them to end up overwritten by an earlier write so long as it's eventually consistent.

What's the impact if one of the masters goes down unexpectedly?
Also, what if we need to do maintenance on of them? Ie: reboot for a kernel upgrade?. Can we simply stop mysql, reboot and then bring mysql up? Do we have to truncate the tables everytime we do this? Do we have to depool the host before the maintenance? What does depooling would mean in this scenario from MW side?

Main stash does not need, and cannot support, replication within the DC. It will not make use of replicas or otherwise tolerate stale data. And in terms of traffic I believe we already confirmed that it is fine to handle all reads with a single master. The other two servers in each DC could be used to shard the data, where MW either distributes/hashes keys equally, or even simpler with MW distributing keys by cache namespace (e.g. cross-wiki keys on srv1, wikis ABC on srv2, other wikis on srv3).

We'll have replication as a hot failover.

Local writes are expected to be immediately readable. As such, there would not be much value in having replicas within the same DC other than as a hot failover. But with the higher level caching we have in-place it's totally fine to just failover to an empty DB instead, so long as we do it consistently between DCs (e.g. if one of them goes down or is taken down, we take out the other as well and point it to an empty one as well).

I am not sure I fully understand this paragraph. We can have replication between two masters, but are you going to do cross-dc writes - outside of the replication thread?.
ie: codfw directly inserting into eqiad master and the other way around?

What's the impact if one of the masters goes down unexpectedly?
Also, what if we need to do maintenance on of them? Ie: reboot for a kernel upgrade?. Can we simply stop mysql, reboot and then bring mysql up? Do we have to truncate the tables everytime we do this? Do we have to depool the host before the maintenance? What does depooling would mean in this scenario from MW side?

I'm not sure. What do we do for parser cache, external store, and x1?

In terms of expections, truncating would be fine if that makes it easier. Temporarily not able to write should be fine as well. The writes are allowed to fail, and if not already, will be given a very short timeout. I'm curious what our options would be if we were to avoid temporarily loss of write ability, but intuitively I would think there is no option other than to allow for uncoordinated/automatic failover which pretty much requires breaking consistency requirements (the same issue we had with memcached/nutcracker in the past where keys re-hash locally and intermittently causing stale data to be exposed and split-brain due to mismatching replication effectively etc.)

@Krinkle wrote:

Local writes are expected to be immediately readable. As such, there would not be much value in having replicas within the same DC other than as a hot failover. But with the higher level caching we have in-place it's totally fine to just failover to an empty DB instead, so long as we do it consistently between DCs […]

I am not sure I fully understand this paragraph. We can have replication between two masters, but are you going to do cross-dc writes - outside of the replication thread?.
ie: codfw directly inserting into eqiad master and the other way around?

No, we would not have a way to target one master separately from the other, or even a master separate from a replica. MW would only be aware of one server for a given shard of data. No master/replica from its point of view.

In the future we may have a use case for a small subset of keys that want atomicity where they can't accidentally be overwritten by other DCs, in that case we will want to those subset of keys written always to the "primary" master. If and when that comes up, it will not be allowed to happen cross-dc in a web request as that would violate our multi-dc/local-dc guidelines for general web traffic. Instead, such use case will most likely involve the queuing of a job from codfw with any data in its metadata, which then executes in eqiad and writes it there. So even that use case would not require new primitives or requirements from an operational level.

What's the impact if one of the masters goes down unexpectedly?
Also, what if we need to do maintenance on of them? Ie: reboot for a kernel upgrade?. Can we simply stop mysql, reboot and then bring mysql up? Do we have to truncate the tables everytime we do this? Do we have to depool the host before the maintenance? What does depooling would mean in this scenario from MW side?

I'm not sure. What do we do for parser cache, external store, and x1?

So external store and x1 are handled via dbctl and we do depool them.
Handling parsercache, operationally is a nightmare and that's why we want to avoid replicating that model. It all starts with having to depool them via a MW commit, and then basically we place one host instead of the other, so that one gets "dirty".
It is very confusing and very error prone.

If we are going to handle x2 the same way we do x1 (via dbctl), then depooling a host is just a matter of a command. But MW obviously needs to understand what it means, I want to make sure that'll work. Currently x2 is being handled as x1, as an externalload section (same as external store).

In terms of expections, truncating would be fine if that makes it easier. Temporarily not able to write should be fine as well. The writes are allowed to fail, and if not already, will be given a very short timeout. I'm curious what our options would be if we were to avoid temporarily loss of write ability, but intuitively I would think there is no option other than to allow for uncoordinated/automatic failover which pretty much requires breaking consistency requirements (the same issue we had with memcached/nutcracker in the past where keys re-hash locally and intermittently causing stale data to be exposed and split-brain due to mismatching replication effectively etc.)

Actually truncating it makes things a bit more painful, as after each depool we'd need to truncate it. Ideally we should be able to:

  • Depool
  • Reboot the host
  • Repool

If a host goes hard down (ie: HW problem), then we should just:

  • Promote one of its replicas to master

If all that is supported and "expected", then we should be good.

No, we would not have a way to target one master separately from the other, or even a master separate from a replica. MW would only be aware of one server for a given shard of data. No master/replica from its point of view.

Gotcha - that sounds good.

In the future we may have a use case for a small subset of keys that want atomicity where they can't accidentally be overwritten by other DCs, in that case we will want to those subset of keys written always to the "primary" master. If and when that comes up, it will not be allowed to happen cross-dc in a web request as that would violate our multi-dc/local-dc guidelines for general web traffic. Instead, such use case will most likely involve the queuing of a job from codfw with any data in its metadata, which then executes in eqiad and writes it there. So even that use case would not require new primitives or requirements from an operational level.

Thanks for clarifying that - that was my worry, cross-dc writes/reads

So with all these clarifications I think some next steps:

  • @Krinkle Do we need to enable codfw <-> eqiad replication in the end? eqiad replicates the writes to codfw - do we need codfw to replicate its writes to eqiad?
  • @Krinkle If your team could check what would be the behaviour if we simply depool a host via dbctl? Would that stop writes nicely? Would that break the site? :-)
  • If @Kormat could give some estimations on how much puppet work would be required to make the "inactive" dc master (codfw in this case) to be writable via puppet and fix the monitoring check so it is expected for that particular master to have read_only=OFF

Thank you

nightmare […] It all starts with having to depool them via a MW commit, […]

I'm confident we can avoid this for mainstash db (x2).

The fact that this requires a MW commit today for parser cache is afaik "just" because we haven't moved more of db-related config to Etcd. As with previous moves, we need to be very careful and aware of the contractual expectations and needs, which for parser cache are indeed non-trivial, but ultimately it is just a primitive array structure that has no inherent need for being in PHP or MW config. As far as I'm concerned parser cache config can and should, too, be moved to Etcd if there are no other stakeholders raising concerns against that.

Do we need to enable codfw <-> eqiad replication in the end? eqiad replicates the writes to codfw - do we need codfw to replicate its writes to eqiad?

Yes. Writes happen in either and need to be eventually consistent. Bi-di replication is a blocker for multi-dc. But in the immediate very short term, eqiad>codfw is enough to unblock us for a few days/weeks since we're not planning to switch on multi-dc traffic tomorrow.

If your team could check what would be the behaviour if we simply depool a host via dbctl? Would that stop writes nicely? Would that break the site? :-)

I'm not familiar with what a depool in dbctl does exactly. I'll let @aaron answer this one, but if you could show a diff on the data structure during a depool, that would help meanwhile.

I think in the end the complexity of what dbctl needs to do will depend a lot on how we will use the three x2 hosts. There are broadly two options:

  • Use only one of them for all reads/writes, and for all cache keys/namespaces/wikis. This should be trivial in terms of load and data size. The other two would be replicated standbys. I don't know why I was avoiding this in my mind. I think this makes the most sense.
  • Use all three masters, to maximise their use and going all-in on the fact that we don't use replicas for reads, and don't have a strong need for recovery for "rare" events. Given the other requirements we have, this means all three would be known to MW, and depooling would involve presering the order and amount of entries, but with some moving from one to another and doing the same across DCs to avoid split-brain.

nightmare […] It all starts with having to depool them via a MW commit, […]

I'm confident we can avoid this for mainstash db (x2).

The fact that this requires a MW commit today for parser cache is afaik "just" because we haven't moved more of db-related config to Etcd. As with previous moves, we need to be very careful and aware of the contractual expectations and needs, which for parser cache are indeed non-trivial, but ultimately it is just a primitive array structure that has no inherent need for being in PHP or MW config. As far as I'm concerned parser cache config can and should, too, be moved to Etcd if there are no other stakeholders raising concerns against that.

I would love to see that, but not sure how doable it is. That's probably for @CDanis to estimate.

Do we need to enable codfw <-> eqiad replication in the end? eqiad replicates the writes to codfw - do we need codfw to replicate its writes to eqiad?

Yes. Writes happen in either and need to be eventually consistent. Bi-di replication is a blocker for multi-dc. But in the immediate very short term, eqiad>codfw is enough to unblock us for a few days/weeks since we're not planning to switch on multi-dc traffic tomorrow.

Done, codfw -> eqiad now configured.

If your team could check what would be the behaviour if we simply depool a host via dbctl? Would that stop writes nicely? Would that break the site? :-)

I'm not familiar with what a depool in dbctl does exactly. I'll let @aaron answer this one, but if you could show a diff on the data structure during a depool, that would help meanwhile.

This is how hosts on etcd config looks like: https://noc.wikimedia.org/dbconfig/eqiad.json
And this is a depool: https://phabricator.wikimedia.org/P14104

I think in the end the complexity of what dbctl needs to do will depend a lot on how we will use the three x2 hosts. There are broadly two options:

  • Use only one of them for all reads/writes, and for all cache keys/namespaces/wikis. This should be trivial in terms of load and data size. The other two would be replicated standbys. I don't know why I was avoiding this in my mind. I think this makes the most sense.
  • Use all three masters, to maximise their use and going all-in on the fact that we don't use replicas for reads, and don't have a strong need for recovery for "rare" events. Given the other requirements we have, this means all three would be known to MW, and depooling would involve presering the order and amount of entries, but with some moving from one to another and doing the same across DCs to avoid split-brain.

I would like to prefer to go with the first one, otherwise we're sort of replicating what we have on parsercache.

Thanks

Most likely the x2 section should be added to spicerack too in spicerack/mysql_legacy.py:CORE_SECTIONS

Change 662631 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software/spicerack@master] mysql_legacy.py: Add x2

https://gerrit.wikimedia.org/r/662631

nightmare […] It all starts with having to depool them via a MW commit, […]

I'm confident we can avoid this for mainstash db (x2).

The fact that this requires a MW commit today for parser cache is afaik "just" because we haven't moved more of db-related config to Etcd. As with previous moves, we need to be very careful and aware of the contractual expectations and needs, which for parser cache are indeed non-trivial, but ultimately it is just a primitive array structure that has no inherent need for being in PHP or MW config. As far as I'm concerned parser cache config can and should, too, be moved to Etcd if there are no other stakeholders raising concerns against that.

I would love to see that, but not sure how doable it is. That's probably for @CDanis to estimate.

To make sure I understand -- is the only variable at play here $wmgParserCacheDBs? Is there anything else that needs to be maintained to manage MW's notions of parser cache?

I will let MW experts to answer to that.

However, as I mentioned throughout the task, I don't want to go into the same data/operational mode we have with parsercache, which is very messy and has lots of pain points for us.
Having two masters being written locally (and connected between them) is a better approach for us, sort of like x1 but with both being writable.

  • If @Kormat could give some estimations on how much puppet work would be required to make the "inactive" dc master (codfw in this case) to be writable via puppet and fix the monitoring check so it is expected for that particular master to have read_only=OFF

I'd estimate 2-3 days work, so nothing awful.

Thanks @Kormat
@Krinkle @aaron - let's go for the x1 approach but with local masters being writable then?

Change 662631 merged by jenkins-bot:
[operations/software/spicerack@master] mysql_legacy.py: Add x2

https://gerrit.wikimedia.org/r/662631

[...] There are broadly two options:

  • Use only one of them for all reads/writes, and for all cache keys/namespaces/wikis. This should be trivial in terms of load and data size. The other two would be replicated standbys. I don't know why I was avoiding this in my mind. I think this makes the most sense.
  • Use all three masters, to maximise their use and going all-in on the fact that we don't use replicas for reads, and don't have a strong need for recovery for "rare" events. Given the other requirements we have, this means all three would be known to MW, and depooling would involve presering the order and amount of entries, but with some moving from one to another and doing the same across DCs to avoid split-brain.

I would like to prefer to go with the first one, [...]

@Krinkle @aaron - let's go for the x1 approach but with local masters being writable then?

Sounds good.

Thanks @Kormat
@Krinkle @aaron - let's go for the x1 approach but with local masters being writable then?

LGTM.

  • @Krinkle If your team could check what would be the behaviour if we simply depool a host via dbctl? Would that stop writes nicely? Would that break the site? :-)

Thank you

Depooling replicas would be not be a problem. Depooling a master would work the same as described at https://wikitech.wikimedia.org/wiki/Dbctl . SqlBagOStuff checks section and server level read-only mode, causing the methods to fail fast and return false. Any BagOStuff caller should be able to handle false values (usually by ignoring it or logging, though it depends on the caller).

Thanks @Krinkle and @aaron.
We synced up in our meeting today, assigning this to Stevie for the puppet changes needed.

The puppet changes are now in place.

Marostegui claimed this task.

Thanks @Kormat for getting this last bit done.
@Krinkle I believe everything is now in place.
Both masters are writable and replicating to each other, so closing this as resolved. Please re-open if you feel there's something else needed.