Page MenuHomePhabricator

Possible to run writes (e.g. UPDATE) on Beta Cluster replica
Closed, ResolvedPublic

Description

I assume this is the same in prod, but I'm not sure.

I accidentally did a write operation on:

sql enwiki

on the Beta Cluster (see T109982), because I thought that still provided master. legoktm told me it now provides slave, but the UPDATE did not fail.

Not sure if this also applies to stat1003.

(I'm going to try to correct this case by re-running in master.)

Event Timeline

All the sql command was changed to do was connect to a server returned by wfGetLB()->getServerName(wfGetLB()->getReaderIndex()) by default instead of wfGetLB()->getServerName(0), see T105046

However, it does appear that read_only is OFF on both deployment-db1 and deployment-db2.

Not sure if this also applies to stat1003.

Isn't that a production analytics host? What does it have to do with anything?

Isn't that a production analytics host? What does it have to do with anything?

There are also slaves of the wiki databases accessible from there:

mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT * FROM page ORDER BY page_id DESC LIMIT 1;

Wondering if those slaves would also allow accidental UPDATEs.

jcrespo triaged this task as Medium priority.Sep 8 2015, 9:46 AM
jcrespo added a subscriber: jcrespo.

Analytics stores (dbstoreX) are all with read_only=0 (aka s3-analytics-slave). I do not know yet, I suppose that because they have their own databases that they have to write to.

Slaves on beta should be fixed, but I will need help for that.

I have also created T111766.

From wmf-config/db-labs.php the Beta-Cluster-Infrastructure has two SQL servers, a master and a slave:

        'hostsByName' => array(
            'deployment-db1'  => '10.68.16.193', # deployment-db1.eqiad.wmflabs
            'deployment-db2'  => '10.68.17.94', # deployment-db2.eqiad.wmflabs
        ),
...
        'sectionLoads' => array(
            'DEFAULT' => array(
                'deployment-db1'     => 0,
                'deployment-db2'     => 400,
            ),
        ),

I think at one point some database had db2 has the master and db1 has the slave but that is no more the case apparently.

So it seems it is all about ™ setting the read-only flag on deployment-db2. They both have the puppet class `role::mariadb::beta which apply the template templates/mariadb/beta.my.cnf.erb` that comes with:

[mysqld]
read_only = 0

Maybe we can pass a type master/slave to vary that setting? Could be done via hiera maybe.

This caused complications when trying to fix T148111: DBQueryError when creating account on Beta, or logging into newly-created accounts too. @bd808 accidentally ran the ALTER on the slave, I then ran it on master, but that stopped replication, so we had to figure out how to restart it (it's fixed now).

Mattflaschen-WMF renamed this task from Possible to run writes (e.g. UPDATE) on slave to Possible to run writes (e.g. UPDATE) on Beta Cluster replica.Oct 14 2016, 9:37 PM

Just judging from the task title, this and T183245: Ensure replica DB in labs is read-only look like being duplicates?

Instead of tracking the current primary in hiera/etc, I'd propose to just set read_only = 1 by default on all beta database servers. The replicas should be read-only (which this task is all about), and the primary server isn't restarted often (especially now that Cloud VPS VMs can be live migrated between hypervisors without downtime), if they are it's usually a human who can set the primary to read-write using set global read_only = 0; and if it's a crash it's a good idea for a human to look at the logs before allowing writes.

Thoughts?

I'd propose to just set read_only = 1 by default on all beta database servers

This is exactly how we handle production servers, but there is a difference: if a primary db role gets active with read_only = 1, a page is generated to a lot of people (there is monitoring to alert of potential human errors). Is there a way to have effective monitoring on beta, so people who do maintenance are aware of that?

Is there a way to have effective monitoring on beta

No, not really. I don't imagine that being a problem unless it becomes frequent, I'd much rather have it read only for some time than have to repair mariadb instances with broken replication. Just documentate it somewhere, maybe add a section about that to https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Databases and link it on deployment-db motd and the mediawiki read-only message.

Change 684034 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P::mariadb::beta: Set read only by default

https://gerrit.wikimedia.org/r/684034

Change 684034 merged by Giuseppe Lavagetto:

[operations/puppet@production] P::mariadb::beta: Set read only by default

https://gerrit.wikimedia.org/r/684034

taavi claimed this task.

I'm calling this resolved. The patches are deployed and I amended the documentation on https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Databases#Read_only for these situations and updated enwiki and metawiki error messages to link to it.