Repurpose db1108 as generic Analytics db replica
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	elukey
	Oct 7 2019, 1:37 PM

Description

In T231208 some issue were highlighted:

the analytics dbs (matomo, superset, oozie, etc..) are all running on single db hosts without any replication, and taking backups following SRE best practices leads to issue like excessive lock contention between applications and backup software.
the current backups of the Analytics databases diverged a lot from the best practices that SRE follows and might lead to inconsistent snapshots in some cases.

In T231858 some issues were highlighted:

the log database on db1107 is way different from db1108, they can't be really exchanged without user noticing it in their query results.
the logdatabase contains historical data that would be nice to have available (read-only) for more time before completely relying on HDFS data. We still have the past 1.5y of Eventlogging data on HDFS already, we are sunsetting the Mysql support.
moving the log database to one of the dbstore nodes would require a lot of engineering time and probably not be the best solution in terms of availability and resource usage of the dbstore cluster.

Given the above points, I have a proposal for db1108:

after the deprecation of mysql-eventlogging, remove all eventlogging-related replication code.
repurpose it as generic analytics database replica: keep the log database as it is, and replicate the matomo, superset, etc.. from the Analytics db hosts (an-coord1001, matomo1001)
add mariadb-bacula backups configuration for db1108

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

How does the proposal sound?

Details

Subject	Repo	Branch	Lines +/-
Remove AAAA/PTR records for db1108	operations/dns	master	+0 -2
mariadb: Match port 3351 and 3352 to 2 analytics sections	operations/puppet	production	+4 -0
analytics-in[46]: add new ports for term mysql-replica	operations/homer/public	master	+12 -1
mariadb-backups: Update backup_mariadb.py to HEAD	operations/puppet	production	+1 -1
mariadb-backups: Add _ to the list of characters alowed for section names	operations/software/wmfmariadbpy	master	+1 -1
mariadb-backups: Update backup automation to wmfmariadbpy's HEAD	operations/puppet	production	+13 -0
mariadb: Add port analytics assignment to wmfmariadbpy and backups	operations/software/wmfmariadbpy	master	+15 -0
mariadb: Increase misc db binlog retention to 14 days	operations/puppet	production	+1 -1
mariadb: Increase analytics binlog retention time to 14 days	operations/puppet	production	+2 -2
mariadb-backups: Move db1108 (analytics db) backups' ports	operations/puppet	production	+2 -2
profile::mariadb::misc::analytics::multiinstance: change ports	operations/puppet	production	+6 -6
mariadb-backups: Setup db1108 as the source of backups for analytics dbs	operations/puppet	production	+6 -0
mariadb-backups: Monitor for mariadb backups of matomo&analytics_meta	operations/puppet	production	+8 -1
Add PTR/AAAA records for db1108	operations/dns	master	+2 -1
profile::piwik::database: add TLS config for mariadb	operations/puppet	production	+7 -0
profile::mariadb::misc::analytics::multiinstance: move meta to 3306	operations/puppet	production	+3 -3
piwik: add binlog to database config.	operations/puppet	production	+5 -134
Allow db1108's IPv6 address to replicate matomo/meta databases	operations/puppet	production	+2 -2
Allow db1108 to replicate data from matomo and meta databases	operations/puppet	production	+2 -2
profile::mariadb::misc::analytics::multiinstance: use underscore	operations/puppet	production	+3 -3
Rename analytics meta mariadb instance for Backup host	operations/puppet	production	+9 -9
Introduce profile::mariadb::misc::analytics	operations/puppet	production	+79 -170
Remove the analytics-slave CNAME	operations/dns	master	+0 -1
profile::reportupdater::jobs: absent old RU job	operations/puppet	production	+1 -75
Reimage db1108 to Debian Buster	operations/puppet	production	+3 -5
eventlogging: Change basedir depending on the OS	operations/puppet	production	+6 -1
profile::mariadb::misc::eventlogging::database: set db to read only	operations/puppet	production	+5 -7

Related Objects
Search...

Status	Assigned	Task
Resolved	Ottomata	T159170 Sunset MySQL data store for eventlogging
Resolved	odimitrijevic	T240437 Analytics Ops Technical Debt
Resolved	elukey	T234826 Repurpose db1108 as generic Analytics db replica
Resolved	Marostegui	T238113 Repurpose db1107 as a generic database

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I had a chat with my team, and after some discussion we created T250709 to figure out how hard it would be to load all the data in the log database to Hive/HDFS. In case of a negative outcome we'll try to come up with an alternative plan for those big tables, like dropping them etc..

Setting this task as stalled for the moment.

elukey moved this task from In Progress to Paused on the Analytics-Kanban board.Apr 29 2020, 3:03 PM

elukey moved this task from In Progress to Waiting for others on the User-Elukey board.May 11 2020, 10:28 AM

elukey added a project: Analytics-Clusters.Jun 10 2020, 2:43 PM

elukey moved this task from Backlog to Q4 2019/2020 on the Analytics-Clusters board.

elukey changed the task status from Stalled to Open.Jun 16 2020, 7:15 AM

In T250709 Dan was able to pull all data from db1108's log database on HDFS, and my team is currently vetting the data to double check that everything looks good. If so, we'll be able to drop the log database from db1108 and start fresh.

@Marostegui in this case, should we just reimage the host to Buster as first step (once my team is ok with the data on HDFS), delete all puppet config and come up with a simple multi-instance set up for the Analytics database to mirror? (currently two, matomo and analytics-meta).

That sounds good to me.
Reminder, use: echo partman/custom/db.cfg ;; \ recipe when reimaging so everything is wiped

Change 607219 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] eventlogging: Change basedir depending on the OS

https://gerrit.wikimedia.org/r/607219

Change 607219 merged by Marostegui:
[operations/puppet@production] eventlogging: Change basedir depending on the OS

https://gerrit.wikimedia.org/r/607219

Change 607438 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Reimage db1108 to Debian Buster

https://gerrit.wikimedia.org/r/607438

Change 607438 merged by Elukey:
[operations/puppet@production] Reimage db1108 to Debian Buster

https://gerrit.wikimedia.org/r/607438

I was about to reimage and then I remembered about the staging database. I checked and there are ~270 tables on the db, most of them probably not used anymore but I see some RU config:

 1 ## Configuration for grabbing datasets on page creation events$
 2 ## for all wikis, as well as some events related to ACTRIAL.$
 3 ## Ref: https://phabricator.wikimedia.org/T170850$
 4 $
 5 databases:$
 6     dbstore:$
 7         host: "analytics-slave.eqiad.wmnet"$
 8         port: 3306$
 9         creds_file: /etc/mysql/conf.d/stats-research-client.cnf$
10         db: log$

The above file is in reportupdater-queries/page-creation/config.yaml. @mforns can we check together what RU jobs are still using the staging db on db1108 and possibly move them elsewhere? (either hive or the staging db on the dbstores).

EDIT: my bad, I just realized that the job uses the log database, so I guess that this job has been not active or failing for a long time?

Change 607537 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::reportupdater::jobs: absent old RU job

https://gerrit.wikimedia.org/r/607537

Change 607537 merged by Elukey:
[operations/puppet@production] profile::reportupdater::jobs: absent old RU job

https://gerrit.wikimedia.org/r/607537

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['db1108.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006241616_elukey_28643.log.

Mentioned in SAL (#wikimedia-operations) [2020-06-24T16:17:28Z] <elukey> reimage db1108 to debian Buster - T234826

Completed auto-reimage of hosts:

['db1108.eqiad.wmnet']

and were ALL successful.

Change 607569 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Remove the analytics-slave CNAME

https://gerrit.wikimedia.org/r/607569

Change 607569 merged by Elukey:
[operations/dns@master] Remove the analytics-slave CNAME

https://gerrit.wikimedia.org/r/607569

Change 553742 merged by Elukey:
[operations/puppet@production] Introduce profile::mariadb::misc::analytics

https://gerrit.wikimedia.org/r/c/operations/puppet/ /553742

Change 609106 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Rename analytics meta mariadb instance for Backup host

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609106

Change 609106 merged by Elukey:
[operations/puppet@production] Rename analytics meta mariadb instance for Backup host

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609106

Change 609112 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: use underscore

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609112

Change 609112 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: use underscore

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609112

Change 609160 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Allow db1108 to replicate data from matomo and meta databases

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609160

Change 609160 merged by Elukey:
[operations/puppet@production] Allow db1108 to replicate data from matomo and meta databases

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609160

Change 609161 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Allow db1108's IPv6 address to replicate matomo/meta databases

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609161

Change 609161 merged by Elukey:
[operations/puppet@production] Allow db1108's IPv6 address to replicate matomo/meta databases

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609161

Status of the databases:

analytics-meta has binlog enabled, with ROW format but not gtid. Mariadb version 10.1 (Stretch)
matomo doesn't have binlog enabled, and not gtid. Mariadb version 10.4 (Buster)

Both servers will need server-id setup (that is why we set it up with the ipv4 integer by default on other hosts) and binlog enabled. matomo will need a restart for enabling binlog. GTID is nice but not a blocker. We can try anyway (we will need an accounts and set of grants for it) and see how it goes.

elukey moved this task from Q4 2019/2020 to Q1 2020/2021 on the Analytics-Clusters board.Jul 2 2020, 4:06 PM

elukey moved this task from Paused to In Progress on the Analytics-Kanban board.Jul 3 2020, 10:23 AM

Change 609421 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] piwik: add binlog and server-id

https://gerrit.wikimedia.org/r/609421

In T234826#6275100, @jcrespo wrote:

Both servers will need server-id setup (that is why we set it up with the ipv4 integer by default on other hosts) and binlog enabled. matomo will need a restart for enabling binlog. GTID is nice but not a blocker. We can try anyway (we will need an accounts and set of grants for it) and see how it goes.

TIL, I was completely unaware of what mariadb::config is doing behind the scenes. Both matomo and meta have server-id with ipv4 integer, the only thing missing is binlog for matomo/piwik. My bad I was reading puppet and the my.cnf config in the wrong way!

Aklapper removed a project: Analytics.Jul 4 2020, 7:59 AM

Change 609421 merged by Elukey:
[operations/puppet@production] piwik: add binlog to database config.

https://gerrit.wikimedia.org/r/609421

In T234826#6274823, @elukey wrote:

Status of the databases:

analytics-meta has binlog enabled, with ROW format but not gtid. Mariadb version 10.1 (Stretch)
matomo doesn't have binlog enabled, and not gtid. Mariadb version 10.4 (Buster)

From what I remember we agreed that db1108 will be Buster + Mariadb 10.4, right?
Just checking that those OS and Mariadb versions are the ones that are currently running their active hosts, no?

Thank you!

Yes correct I was checking and reporting the current status to find with you folks if there was any blocker that I didn't know. I just enabled ROW binlog for matomo, that was the missing thing, if the rest looks good then I think we are ready for the replicas :)

Side note - in T257412 I am investigating a failover plan to write down in case an-coord1001 goes down (currently a big SPOF). The idea would be to be able to failover to db1108 (with the necessary consequences, like setting it read/write temporarily, rebuild the master afterwards, etc..) but if possible I'd like to avoid using dbproxies for the moment. One idea would be to have db1108's meta replica to listen on port 3306 (like an-coord1001) and then create a simple DNS cname like analytics-meta.eqiad.wmnet to either an-coord1001 (regular use case) or db1108 (when failing over). Having both on the same port would require less changes in puppet, and an easier failover process. Would it be possible to use 3306 on db1108 or is there a standard/convention that would be violated? (so better not and find a different solution).

Change 612864 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: move meta to 3306

https://gerrit.wikimedia.org/r/612864

Change 612864 abandoned by Elukey:
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: move meta to 3306

Reason:
Not needed anymore.

https://gerrit.wikimedia.org/r/612864

In T234826#6307751, @elukey wrote:

Side note - in T257412 I am investigating a failover plan to write down in case an-coord1001 goes down (currently a big SPOF). The idea would be to be able to failover to db1108 (with the necessary consequences, like setting it read/write temporarily, rebuild the master afterwards, etc..) but if possible I'd like to avoid using dbproxies for the moment. One idea would be to have db1108's meta replica to listen on port 3306 (like an-coord1001) and then create a simple DNS cname like analytics-meta.eqiad.wmnet to either an-coord1001 (regular use case) or db1108 (when failing over). Having both on the same port would require less changes in puppet, and an easier failover process. Would it be possible to use 3306 on db1108 or is there a standard/convention that would be violated? (so better not and find a different solution).

After a chat with Andrew we decided to just use a centralized hostname:port combination in puppet since most of the services will need a restart anyway.

In T234826#6307751, @elukey wrote:

Side note - in T257412 I am investigating a failover plan to write down in case an-coord1001 goes down (currently a big SPOF). The idea would be to be able to failover to db1108 (with the necessary consequences, like setting it read/write temporarily, rebuild the master afterwards, etc..) but if possible I'd like to avoid using dbproxies for the moment. One idea would be to have db1108's meta replica to listen on port 3306 (like an-coord1001) and then create a simple DNS cname like analytics-meta.eqiad.wmnet to either an-coord1001 (regular use case) or db1108 (when failing over). Having both on the same port would require less changes in puppet, and an easier failover process. Would it be possible to use 3306 on db1108 or is there a standard/convention that would be violated? (so better not and find a different solution).

I thought the idea was to use multi-instance on db1108, in that case, you can only have a given instance running on 3306, the rest would need to run on different ports.
For multi-instance we normally do not use 3306 for any of the instances, but this is not a set-in-stone kind of rule, it is more for internal organization.

@Marostegui yep yep it was only an idea to avoid changing the port in puppet when doing the failover (and just update the CNAME with the correct hostname) but after a chat with Andrew we decided that it would have been more cumbersome that helpful, so idea ended up in /dev/null :D

Today I added the user repl (with a custom password saved in the private repo) to mariadb on matomo1002 and an-coord1001, the two master dbs. Also granted the following: GRANT REPLICATION SLAVE ON *.* TO 'repl'@'10.%';

Change 613651 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::piwik::database: add TLS config for mariadb

https://gerrit.wikimedia.org/r/613651

Change 613651 merged by Elukey:
[operations/puppet@production] profile::piwik::database: add TLS config for mariadb

https://gerrit.wikimedia.org/r/613651

Ok so on Monday I'll restart mariadb on an-coord1001 and matomo1002 to pick up TLS changes. With the repl user created, in theory we should be able to set up replication.

IIUC the next steps should be something like:

dump the databases with mysqldump --single-transaction --master-data=2 -u $USER -p $PASSWORD -h $host $DATABASE >> file.sql
get the id of the last transaction recorded in the binlog, and then set up replication from db1108 on both instances

I am pretty sure that I am not aware of something important, in case please tell me :)

Change 614751 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Add PTR/AAAA records for db1108

https://gerrit.wikimedia.org/r/614751

Tested that the two dbs are capable of using TLS. Two minor things to review/merge (that are not blockers):

ipv6 records for db1108 - https://gerrit.wikimedia.org/r/614751 (not sure if needed since a lot of dbs don't have these records)
allow the analytics vlan to contact db1108 (this is useful when there is the need to query db1108 directly from within the VLAN) - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/614702

Today me and Manuel (well, more Manuel :P) set up the replication between matomo1002 and an-coord1001 to db1108, all good! I wrote https://wikitech.wikimedia.org/wiki/Analytics/Systems/DB_Replica to document all the process. The next step is to set up bacula backups!

Change 616452 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Monitor for mariadb backups of matomo&analytics_meta

https://gerrit.wikimedia.org/r/616452

Change 616453 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Setup db1108 as the source of backups for analytics dbs

https://gerrit.wikimedia.org/r/616453

Change 614751 merged by Elukey:
[operations/dns@master] Add PTR/AAAA records for db1108

https://gerrit.wikimedia.org/r/614751

Change 616452 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Monitor for mariadb backups of matomo&analytics_meta

https://gerrit.wikimedia.org/r/616452

Change 616453 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Setup db1108 as the source of backups for analytics dbs

https://gerrit.wikimedia.org/r/616453

There is 2 ongoing issues:

https://github.com/wikimedia/operations-dns/commit/29ff38c263c9f7b4fd366aeb1999ebc4f2d3d8a5 added the ipv6 ip of netflow1001 to db1108, this is causing quite a log of connection timeouts.

Also I need a list of databases to backup on analytics_meta, as there are a few.

So the first test about the backups seems to indicate the it works, but we will need some tuning about the backup options and the server to generate backups regularly.

Fixed the DNS issue with https://gerrit.wikimedia.org/r/c/operations/dns/+/616864, thanks a lot for spotting it! My bad :(

About the list of databases - if possible all the ones on analytics-meta by default, but if an explicit list is needed: druid druid_public_eqiad hue oozie search_airflow superset_production hive_metastore

Thanks!

Defaulting to ipv6 is becoming a high complex problem because while mysql has no issue with it, our mysql accounting is based on ip, not dns (and so far it was assumed that a host only had an IP), and it adds an extra layer of refactoring to our authentication management we hadn't had to deal with before.

I added the AAAA record since all the Analytics hosts have one, so replication from db1108 would fail/timeout multiple times before dropping to IPv4. It works as well without AAAA records for db1108, so I can remove them since not strictly needed, a big refactoring is not needed.

Change 617064 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Remove AAAA/PTR records for db1108

https://gerrit.wikimedia.org/r/617064

Change 617077 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Create ugly exception for port assignment for db1108

https://gerrit.wikimedia.org/r/617077

Change 617077 abandoned by Jcrespo:
[operations/puppet@production] mariadb: Create ugly exception for port assignment for db1108

Reason:
moving port instead

https://gerrit.wikimedia.org/r/617077

Change 617479 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: change ports

https://gerrit.wikimedia.org/r/617479

Change 617479 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::analytics::multiinstance: change ports

https://gerrit.wikimedia.org/r/617479

Change 617649 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/homer/public@master] analytics-in[46]: add new ports for term mysql-replica

https://gerrit.wikimedia.org/r/617649

Change 617077 restored by Jcrespo:
[operations/puppet@production] mariadb: Create ugly exception for port assignment for db1108

https://gerrit.wikimedia.org/r/617077

Change 617650 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move db1108 (analytics db) backups' ports

https://gerrit.wikimedia.org/r/617650

Change 617077 merged by Jcrespo:
[operations/puppet@production] mariadb: Match port 3351 and 3352 to 2 analytics sections

https://gerrit.wikimedia.org/r/617077

Change 617650 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move db1108 (analytics db) backups' ports

https://gerrit.wikimedia.org/r/617650

Change 617653 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Increase analytics binlog retention time to 14 days

https://gerrit.wikimedia.org/r/617653

Change 617653 merged by Jcrespo:
[operations/puppet@production] mariadb: Increase analytics binlog retention time to 14 days

https://gerrit.wikimedia.org/r/617653

Change 617656 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Increase misc db binlog retention to 14 days

https://gerrit.wikimedia.org/r/617656

Change 617656 merged by Jcrespo:
[operations/puppet@production] mariadb: Increase misc db binlog retention to 14 days

https://gerrit.wikimedia.org/r/617656

Change 617661 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] mariadb: Add port analytics assignment to wmfmariadbpy and backups

https://gerrit.wikimedia.org/r/617661

Change 617661 merged by Jcrespo:
[operations/software/wmfmariadbpy@master] mariadb: Add port analytics assignment to wmfmariadbpy and backups

https://gerrit.wikimedia.org/r/617661

Change 617662 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update backup automation to wmfmariadbpy's HEAD

https://gerrit.wikimedia.org/r/617662

Change 617662 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update backup automation to wmfmariadbpy's HEAD

https://gerrit.wikimedia.org/r/617662

Change 617668 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] mariadb-backups: Add _ to the list of characters alowed for section names

https://gerrit.wikimedia.org/r/617668

Change 617668 merged by jenkins-bot:
[operations/software/wmfmariadbpy@master] mariadb-backups: Add _ to the list of characters alowed for section names

https://gerrit.wikimedia.org/r/617668

Change 617670 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update backup_mariadb.py to HEAD

https://gerrit.wikimedia.org/r/617670

Change 617670 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update backup_mariadb.py to HEAD

https://gerrit.wikimedia.org/r/617670

Change 617649 merged by Elukey:
[operations/homer/public@master] analytics-in[46]: add new ports for term mysql-replica

https://gerrit.wikimedia.org/r/617649

Change 617064 abandoned by Elukey:
[operations/dns@master] Remove AAAA/PTR records for db1108

Reason:

https://gerrit.wikimedia.org/r/617064

jcrespo mentioned this in rOSWB3433ee4249d3: mariadb: Add port analytics assignment to wmfmariadbpy and backups.Sep 1 2020, 8:36 AM

jcrespo mentioned this in rOSWB33377861e330: mariadb-backups: Add _ to the list of characters alowed for section names.

The db1108 host has been running fine for the past couple of months, replicating correctly analytics-meta and piwik data. There are some follow ups to do but this task can be considered done.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Sep 30 2020, 8:36 AM

• Nuria closed this task as Resolved.Oct 12 2020, 5:30 PM

I can confirm backups have been flowing weekly as expected:

+------+------------------------------------------+---------------------+------------+----------+
| id   | name                                     | end_date            | total_size | status   |
+------+------------------------------------------+---------------------+------------+----------+
| 8347 | dump.analytics_meta.2020-10-20--02-08-31 | 2020-10-20 02:10:10 | 1902085735 | finished |
| 8346 | dump.matomo.2020-10-20--02-08-16         | 2020-10-20 02:08:31 |  257155353 | finished |
| 8244 | dump.analytics_meta.2020-10-13--02-08-23 | 2020-10-13 02:10:02 | 1923935772 | finished |
| 8243 | dump.matomo.2020-10-13--02-08-08         | 2020-10-13 02:08:23 |  254349238 | finished |
| 8135 | dump.analytics_meta.2020-10-06--02-08-27 | 2020-10-06 02:10:11 | 1955733786 | finished |
| 8134 | dump.matomo.2020-10-06--02-08-12         | 2020-10-06 02:08:27 |  247783361 | finished |
| 8029 | dump.analytics_meta.2020-09-29--02-07-47 | 2020-09-29 02:09:27 | 1933654247 | finished |
| 8028 | dump.matomo.2020-09-29--02-07-32         | 2020-09-29 02:07:47 |  251649181 | finished |
| 7904 | dump.analytics_meta.2020-09-22--02-08-26 | 2020-09-22 02:09:33 | 1172324325 | finished |
| 7903 | dump.matomo.2020-09-22--02-08-12         | 2020-09-22 02:08:26 |  243380912 | finished |
| 7789 | dump.analytics_meta.2020-09-15--01-13-59 | 2020-09-15 01:14:58 | 1147047998 | finished |
| 7788 | dump.matomo.2020-09-15--01-13-45         | 2020-09-15 01:13:59 |  237464288 | finished |
| 7686 | dump.analytics_meta.2020-09-08--02-15-57 | 2020-09-08 02:16:58 | 1145772141 | finished |
| 7685 | dump.matomo.2020-09-08--02-15-43         | 2020-09-08 02:15:57 |  232434960 | finished |
| 7560 | dump.analytics_meta.2020-09-01--02-07-39 | 2020-09-01 02:08:37 | 1103338608 | finished |
| 7559 | dump.matomo.2020-09-01--02-07-25         | 2020-09-01 02:07:39 |  242438080 | finished |
| 7460 | dump.analytics_meta.2020-08-25--15-56-09 | 2020-08-25 15:57:12 | 1102649799 | finished |
| 7459 | dump.matomo.2020-08-25--15-55-55         | 2020-08-25 15:56:09 |  240587129 | finished |

We can do a demo recovery any time you want.

@jcrespo perfect timing, I have finished https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Mysql_Meta#Restore_a_Backup so I am ready to test it :)

Everything there looks fine! There may be procedures that I could help you simplify to be done more easily, we can talk on a different medium at a later time to avoid spamming other people here.

Nintendofan885 removed a project: Patch-For-Review.Oct 21 2020, 1:31 PM

Repurpose db1108 as generic Analytics db replicaClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Repurpose db1108 as generic Analytics db replica
Closed, ResolvedPublic
Actions

Related Objects
Search...