Page MenuHomePhabricator

Refactor analytics-meta MariaDB layout to use an-mariadb100[12]
Open, MediumPublic3 Estimated Story Points

Description

Current status
  • an-coord1001 runs the 'analytics meta' MariaDB master instance. This instance has several databases for Analytics Cluster operations.
  • an-coord1002 runs a standby replica of this instance, but in the case of a failure, switching to an-coord1002 is a error prone and manual process.
  • matomo1002 runs a MariaDB instance for the 'piwik' database.
  • db1108 runs backup replicas of analytics-meta and matamo MariaDB instances, and backula is used to keep historical backups.
  • As described in T279440: Data drifts between superset_production on an-coord1001 and db1108, the replicas do not match the masters.
  • Relevant MariaDB configs do not necessarily match between masters and replicas.
Desired status
  • All existing analytics_meta databases running from an-db100[12] instead of an-coord100[12]
  • We have confidence in the veracity of both the failover replica (an-db1002) and the backup replica (db11108)
  • Regular and comprehensive backups are running from db1108
  • The failover method from an-db1001 to an-db1002 has been well-defined and tested
  • The restore method from db1108 has been well defined
Implementation steps
  • Dedicated DB hardware to be ordered in Q1 FY2021-2022 to replace an-coord100[12]: an-db100[12].
  • an-coord1002 fully in sync with an-coord1001 and ready for failover.
  • db1108 fully recreated from snapshot of an-coord1001 and performing regular backups.
  • an-db1001 instantiated as a replica of an-coord1001
  • an-db1002 instantiated as a replica of an-coord1001

Switch-over time

  • an-coord1001 switched to read-only
  • an-db1001 promted to master
  • All applications switched to use an-db1001 instead of an-coord1001
  • an-db1002 replicating from an-db1001
  • db1108 replicating from an-db1001
  • MariaDB instances removed from ab-coord100[12]

Notes and Migration Plan here:
https://etherpad.wikimedia.org/p/analytics-meta


Originally, this ticket was setting up multi master instances and being able to do failover for individual MariaDB database instances. However, it was discovered that Data Persistence does not really support MariaDB multi instance master setups, and the reasons for us doing so aren't really that useful. Most of the time, failovers will be manual and done for hardware reasons, meaning all DBs would have to be failed over anyway. Having many master setups means more replicas and binlogs to manage, which makes maintenance like that harder, not easier. Ideally each app's DB would be totally isolated from the others, but we will have to wait until perhaps one day we get persitent volumes in k8s to do this really properly.

For now we are going with a single analytics-meta instance for all databases.

TBD - The matomo server is in the private1 vlan. Do we want to move its database to an-db100[12] and require a new hardware firewall rule for this?

Event Timeline

Ottomata mentioned this in Unknown Object (Task).Aug 2 2021, 7:52 PM
Ottomata added a subtask: Unknown Object (Task).Sep 20 2021, 3:40 PM
Jclark-ctr closed subtask Unknown Object (Task) as Resolved.Sep 21 2021, 7:27 PM

Had a chat with @Marostegui and @Kormat in #wikimedia-data-persistence today, in which we decided that a MariaDB instance for every database on an-coord1001 right now probably isn't the right way to go. There are currently > 10 databases there, and managing replication for each one separately would be a pain.

Instead, I'm leaning towards 2 instances, one for important data-metadata like hive and druid, and one for more user-facing stuff, like superset and airflow.

Change 732400 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Use profile::mariadb_multiinstance for analytics multiinstance

https://gerrit.wikimedia.org/r/732400

Change 735688 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] [WIP] profile::analytics::database::mariadb_multi

https://gerrit.wikimedia.org/r/735688

Change 732400 abandoned by Ottomata:

[operations/puppet@production] Use profile::mariadb_multiinstance for analytics multiinstance

Reason:

Trying a different way.

https://gerrit.wikimedia.org/r/732400

Change 736019 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Add role::analytics_cluster::database::meta on an-db100[12]

https://gerrit.wikimedia.org/r/736019

Change 736034 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Remove unused bigtop hive and oozie database creation code

https://gerrit.wikimedia.org/r/736034

Change 735688 abandoned by Ottomata:

[operations/puppet@production] [WIP] profile::analytics::database::mariadb_multi

Reason:

Giving up on multi instance master. Going single instance with old puppet code in : https://gerrit.wikimedia.org/r/c/operations/puppet/+/736019

https://gerrit.wikimedia.org/r/735688

Ottomata renamed this task from Refactor analytics-meta MariaDB layout to multi instance with failover to Refactor analytics-meta MariaDB layout to use an-db100[12].Nov 8 2021, 3:53 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)

Change 736034 merged by Ottomata:

[operations/puppet@production] Remove unused bigtop hive and oozie database creation code

https://gerrit.wikimedia.org/r/736034

Mentioned in SAL (#wikimedia-analytics) [2021-11-08T18:07:07Z] <razzi> run create user 'admin'@'localhost' identified by <password>; grant all privileges on *.* to admin; to allow milimetric to access mysql on an-coord1002 for T284150

FYI, I merged https://phabricator.wikimedia.org/T279440#7489667 into this task as stated there.

In grooming today, we decided that we should recreate just the analytics-meta instance on db1108 asap, so that any backups that are taken will be useful. Perhaps it will still drift (if there is something that is making it drift), but at least the backups should be mostly useful.

This will give us some free space to focus on other team goals (gobblin metrics and airflow), and we will likely postpone the full refactor until next quarter. TBD.

Mentioned in SAL (#wikimedia-analytics) [2021-11-08T19:44:28Z] <razzi> create admin user on an-coord1001 for T284150

Mentioned in SAL (#wikimedia-analytics) [2021-11-08T19:51:32Z] <ottomata> an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150

Mentioned in SAL (#wikimedia-operations) [2021-11-08T19:52:01Z] <ottomata> an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150

This is probably related to this maintenance, but backups on the analytics meta database failed with:

2021-11-16 06:17:50 [ERROR] - Error connecting to database: Access denied for user 'dump'@'2620:0:861:102:10:64:16:31' (using password: YES)

This mean this week's backup couldn't run. Last successful backup: 2021-11-09 03-59-51

(This will show up soon as failed on icinga's check "dump of analytics_meta in eqiad"- but I noticed it early because I was refactoring backup distribution, so reporting it early :-).)

Is there a special manual grant added for the dump user? @BTullis restored the analytics_meta db1108 database from master in T295312, perhaps a grant was lost?

Is there a special manual grant added for the dump user? @BTullis restored the analytics_meta db1108 database from master in T295312, perhaps a grant was lost?

ah!, I understand now- it wasn't lost, just that we normally don't add the dump user to the primary to prevent accidentally taking backups from it in the first place. I will re-add it, with your permission, based on the puppetized grants. A check that what was backedup is what is intended would be appreciated afterwards 0:-).

Mentioned in SAL (#wikimedia-operations) [2021-11-16T14:24:49Z] <jynus> re-adding backup user to db1108:analytics_meta T284150

New backup is running now, will ask you for a review when finished, to make sure expectations of contents are met- as this ticket talks about reorganization, and want to make sure nothing has been lost in the way.

For your review:

Backed up dbs:

/srv/backups/dumps/latest/dump.analytics_meta.2021-11-16--14-28-20$ ls -lha *-schema-create.sql.gz
-rw-r--r-- 1 dump dump  97 Nov 16 14:28 druid_public_eqiad-schema-create.sql.gz
-rw-r--r-- 1 dump dump  84 Nov 16 14:28 druid-schema-create.sql.gz
-rw-r--r-- 1 dump dump  95 Nov 16 14:28 hive_metastore-schema-create.sql.gz
-rw-r--r-- 1 dump dump  82 Nov 16 14:28 hue-schema-create.sql.gz
-rw-r--r-- 1 dump dump  86 Nov 16 14:28 oozie-schema-create.sql.gz
-rw-r--r-- 1 dump dump  93 Nov 16 14:28 search_airflow-schema-create.sql.gz
-rw-r--r-- 1 dump dump 121 Nov 16 14:28 superset_production-schema-create.sql.gz

Backed up tables:

dump.analytics_meta.2021-11-16--14-28-20$ ls -lha --ignore='*schema*' --ignore=metadata
total 1.1G
drwx------  2 dump dump  24K Nov 16 14:29 .
drwx------ 11 dump dump 4.0K Nov 16 14:29 ..
-rw-r--r--  1 dump dump 2.5K Nov 16 14:28 druid.druid_audit.sql.gz
-rw-r--r--  1 dump dump  486 Nov 16 14:28 druid.druid_config.sql.gz
-rw-r--r--  1 dump dump  517 Nov 16 14:28 druid.druid_dataSource.sql.gz
-rw-r--r--  1 dump dump 376K Nov 16 14:28 druid.druid_pendingSegments.sql.gz
-rw-r--r--  1 dump dump 1018 Nov 16 14:28 druid.druid_rules.sql.gz
-rw-r--r--  1 dump dump  19M Nov 16 14:29 druid.druid_segments.sql.gz
-rw-r--r--  1 dump dump 2.7K Nov 16 14:28 druid.druid_supervisors.sql.gz
-rw-r--r--  1 dump dump  366 Nov 16 14:28 druid.druid_tasklocks.sql.gz
-rw-r--r--  1 dump dump  36M Nov 16 14:29 druid.druid_tasklogs.sql.gz
-rw-r--r--  1 dump dump  34M Nov 16 14:29 druid.druid_tasks.sql.gz
-rw-r--r--  1 dump dump  666 Nov 16 14:28 druid_public_eqiad.druid_audit.sql.gz
-rw-r--r--  1 dump dump  421 Nov 16 14:28 druid_public_eqiad.druid_rules.sql.gz
-rw-r--r--  1 dump dump 311K Nov 16 14:28 druid_public_eqiad.druid_segments.sql.gz
-rw-r--r--  1 dump dump 2.2M Nov 16 14:28 druid_public_eqiad.druid_tasklogs.sql.gz
-rw-r--r--  1 dump dump  33K Nov 16 14:28 druid_public_eqiad.druid_tasks.sql.gz
-rw-r--r--  1 dump dump 295K Nov 16 14:28 hive_metastore.BUCKETING_COLS.sql.gz
-rw-r--r--  1 dump dump 8.7K Nov 16 14:28 hive_metastore.CDS.sql.gz
-rw-r--r--  1 dump dump 243K Nov 16 14:28 hive_metastore.COLUMNS_V2.sql.gz
-rw-r--r--  1 dump dump 1.8K Nov 16 14:28 hive_metastore.DBS.sql.gz
-rw-r--r--  1 dump dump  47K Nov 16 14:28 hive_metastore.DELEGATION_TOKENS.sql.gz
-rw-r--r--  1 dump dump  210 Nov 16 14:28 hive_metastore.FUNCS.sql.gz
-rw-r--r--  1 dump dump  171 Nov 16 14:28 hive_metastore.GLOBAL_PRIVS.sql.gz
-rw-r--r--  1 dump dump  312 Nov 16 14:28 hive_metastore.IDXS.sql.gz
-rw-r--r--  1 dump dump  208 Nov 16 14:28 hive_metastore.INDEX_PARAMS.sql.gz
-rw-r--r--  1 dump dump  775 Nov 16 14:28 hive_metastore.MASTER_KEYS.sql.gz
-rw-r--r--  1 dump dump  147 Nov 16 14:28 hive_metastore.NEXT_COMPACTION_QUEUE_ID.sql.gz
-rw-r--r--  1 dump dump  140 Nov 16 14:28 hive_metastore.NEXT_LOCK_ID.sql.gz
-rw-r--r--  1 dump dump  138 Nov 16 14:28 hive_metastore.NEXT_TXN_ID.sql.gz
-rw-r--r--  1 dump dump  743 Nov 16 14:28 hive_metastore.NUCLEUS_TABLES.sql.gz
-rw-r--r--  1 dump dump  18K Nov 16 14:28 hive_metastore.PARTITION_KEYS.sql.gz
-rw-r--r--  1 dump dump  46M Nov 16 14:29 hive_metastore.PARTITION_KEY_VALS.sql.gz
-rw-r--r--  1 dump dump  21M Nov 16 14:28 hive_metastore.PARTITION_PARAMS.sql.gz
-rw-r--r--  1 dump dump  38M Nov 16 14:29 hive_metastore.PARTITIONS.sql.gz
-rw-r--r--  1 dump dump  171 Nov 16 14:28 hive_metastore.ROLES.sql.gz
-rw-r--r--  1 dump dump  47M Nov 16 14:29 hive_metastore.SDS.sql.gz
-rw-r--r--  1 dump dump  343 Nov 16 14:28 hive_metastore.SEQUENCE_TABLE.sql.gz
-rw-r--r--  1 dump dump  11M Nov 16 14:28 hive_metastore.SERDE_PARAMS.sql.gz
-rw-r--r--  1 dump dump  12M Nov 16 14:29 hive_metastore.SERDES.sql.gz
-rw-r--r--  1 dump dump  237 Nov 16 14:28 hive_metastore.SORT_COLS.sql.gz
-rw-r--r--  1 dump dump 1.3K Nov 16 14:28 hive_metastore.TAB_COL_STATS.sql.gz
-rw-r--r--  1 dump dump 179K Nov 16 14:28 hive_metastore.TABLE_PARAMS.sql.gz
-rw-r--r--  1 dump dump 9.8K Nov 16 14:28 hive_metastore.TBL_PRIVS.sql.gz
-rw-r--r--  1 dump dump  54K Nov 16 14:28 hive_metastore.TBLS.sql.gz
-rw-r--r--  1 dump dump  178 Nov 16 14:28 hive_metastore.VERSION.sql.gz
-rw-r--r--  1 dump dump  151 Nov 16 14:28 hue.auth_group.sql.gz
-rw-r--r--  1 dump dump 3.5K Nov 16 14:28 hue.auth_permission.sql.gz
-rw-r--r--  1 dump dump  603 Nov 16 14:28 hue.auth_user_groups.sql.gz
-rw-r--r--  1 dump dump 5.0K Nov 16 14:28 hue.auth_user.sql.gz
-rw-r--r--  1 dump dump  45K Nov 16 14:28 hue.axes_accessattempt.sql.gz
-rw-r--r--  1 dump dump  50K Nov 16 14:28 hue.axes_accesslog.sql.gz
-rw-r--r--  1 dump dump  62K Nov 16 14:28 hue.beeswax_queryhistory.sql.gz
-rw-r--r--  1 dump dump  21K Nov 16 14:28 hue.beeswax_savedquery.sql.gz
-rw-r--r--  1 dump dump 871K Nov 16 14:28 hue.beeswax_session.sql.gz
-rw-r--r--  1 dump dump 5.7K Nov 16 14:28 hue.desktop_document2_dependencies.sql.gz
-rw-r--r--  1 dump dump  274 Nov 16 14:28 hue.desktop_document2permission.sql.gz
-rw-r--r--  1 dump dump 8.3M Nov 16 14:28 hue.desktop_document2.sql.gz
-rw-r--r--  1 dump dump  504 Nov 16 14:28 hue.desktop_documentpermission.sql.gz
-rw-r--r--  1 dump dump 195K Nov 16 14:28 hue.desktop_document.sql.gz
-rw-r--r--  1 dump dump 1.5K Nov 16 14:28 hue.desktop_documenttag.sql.gz
-rw-r--r--  1 dump dump 100K Nov 16 14:28 hue.desktop_document_tags.sql.gz
-rw-r--r--  1 dump dump  150 Nov 16 14:28 hue.desktop_settings.sql.gz
-rw-r--r--  1 dump dump  760 Nov 16 14:28 hue.desktop_userpreferences.sql.gz
-rw-r--r--  1 dump dump  887 Nov 16 14:28 hue.django_content_type.sql.gz
-rw-r--r--  1 dump dump 1.2K Nov 16 14:28 hue.django_migrations.sql.gz
-rw-r--r--  1 dump dump 647K Nov 16 14:28 hue.django_session.sql.gz
-rw-r--r--  1 dump dump  158 Nov 16 14:28 hue.django_site.sql.gz
-rw-r--r--  1 dump dump  168 Nov 16 14:28 hue.documentpermission2_groups.sql.gz
-rw-r--r--  1 dump dump  247 Nov 16 14:28 hue.documentpermission2_users.sql.gz
-rw-r--r--  1 dump dump 1.8K Nov 16 14:28 hue.south_migrationhistory.sql.gz
-rw-r--r--  1 dump dump  279 Nov 16 14:28 hue.useradmin_grouppermission.sql.gz
-rw-r--r--  1 dump dump  781 Nov 16 14:28 hue.useradmin_huepermission.sql.gz
-rw-r--r--  1 dump dump 2.0K Nov 16 14:28 hue.useradmin_userprofile.sql.gz
-rw-r--r--  1 dump dump 1005 Nov 16 14:28 oozie.BUNDLE_ACTIONS.sql.gz
-rw-r--r--  1 dump dump  12K Nov 16 14:28 oozie.BUNDLE_JOBS.sql.gz
-rw-r--r--  1 dump dump  65M Nov 16 14:28 oozie.COORD_ACTIONS.sql.gz
-rw-r--r--  1 dump dump 2.0M Nov 16 14:28 oozie.COORD_JOBS.sql.gz
-rw-r--r--  1 dump dump  175 Nov 16 14:28 oozie.OOZIE_SYS.sql.gz
-rw-r--r--  1 dump dump  17M Nov 16 14:28 oozie.SLA_EVENTS.sql.gz
-rw-r--r--  1 dump dump 7.3M Nov 16 14:28 oozie.SLA_REGISTRATION.sql.gz
-rw-r--r--  1 dump dump  16M Nov 16 14:28 oozie.SLA_SUMMARY.sql.gz
-rw-r--r--  1 dump dump 110M Nov 16 14:28 oozie.WF_ACTIONS.sql.gz
-rw-r--r--  1 dump dump 466M Nov 16 14:29 oozie.WF_JOBS.sql.gz
-rw-r--r--  1 dump dump  184 Nov 16 14:28 search_airflow.alembic_version.sql.gz
-rw-r--r--  1 dump dump  297 Nov 16 14:28 search_airflow.chart.sql.gz
-rw-r--r--  1 dump dump 1.6K Nov 16 14:28 search_airflow.connection.sql.gz
-rw-r--r--  1 dump dump 845K Nov 16 14:28 search_airflow.dag_run.sql.gz
-rw-r--r--  1 dump dump 1.1K Nov 16 14:28 search_airflow.dag.sql.gz
-rw-r--r--  1 dump dump 5.0M Nov 16 14:28 search_airflow.job.sql.gz
-rw-r--r--  1 dump dump  206 Nov 16 14:28 search_airflow.known_event_type.sql.gz
-rw-r--r--  1 dump dump  154 Nov 16 14:28 search_airflow.kube_resource_version.sql.gz
-rw-r--r--  1 dump dump  150 Nov 16 14:28 search_airflow.kube_worker_uuid.sql.gz
-rw-r--r--  1 dump dump  16M Nov 16 14:28 search_airflow.log.sql.gz
-rw-r--r--  1 dump dump  240 Nov 16 14:28 search_airflow.slot_pool.sql.gz
-rw-r--r--  1 dump dump  76K Nov 16 14:28 search_airflow.task_fail.sql.gz
-rw-r--r--  1 dump dump 5.9M Nov 16 14:28 search_airflow.task_instance.sql.gz
-rw-r--r--  1 dump dump  15K Nov 16 14:28 search_airflow.task_reschedule.sql.gz
-rw-r--r--  1 dump dump 1.5K Nov 16 14:28 search_airflow.variable.sql.gz
-rw-r--r--  1 dump dump 9.5K Nov 16 14:28 search_airflow.xcom.sql.gz
-rw-r--r--  1 dump dump 1.1K Nov 16 14:28 superset_production.ab_permission.sql.gz
-rw-r--r--  1 dump dump 3.8K Nov 16 14:28 superset_production.ab_permission_view_role.sql.gz
-rw-r--r--  1 dump dump  20K Nov 16 14:28 superset_production.ab_permission_view.sql.gz
-rw-r--r--  1 dump dump  192 Nov 16 14:28 superset_production.ab_role.sql.gz
-rw-r--r--  1 dump dump 1.4K Nov 16 14:28 superset_production.ab_user_role.sql.gz
-rw-r--r--  1 dump dump  23K Nov 16 14:28 superset_production.ab_user.sql.gz
-rw-r--r--  1 dump dump  40K Nov 16 14:28 superset_production.ab_view_menu.sql.gz
-rw-r--r--  1 dump dump  158 Nov 16 14:28 superset_production.alembic_version.sql.gz
-rw-r--r--  1 dump dump  438 Nov 16 14:28 superset_production.annotation_layer.sql.gz
-rw-r--r--  1 dump dump  643 Nov 16 14:28 superset_production.annotation.sql.gz
-rw-r--r--  1 dump dump  303 Nov 16 14:28 superset_production.clusters.sql.gz
-rw-r--r--  1 dump dump  15K Nov 16 14:28 superset_production.columns.sql.gz
-rw-r--r--  1 dump dump 5.0K Nov 16 14:28 superset_production.dashboard_slices.sql.gz
-rw-r--r--  1 dump dump 127K Nov 16 14:28 superset_production.dashboards.sql.gz
-rw-r--r--  1 dump dump 1.3K Nov 16 14:28 superset_production.dashboard_user.sql.gz
-rw-r--r--  1 dump dump 2.2K Nov 16 14:28 superset_production.datasources.sql.gz
-rw-r--r--  1 dump dump 1.3K Nov 16 14:28 superset_production.dbs.sql.gz
-rw-r--r--  1 dump dump  156 Nov 16 14:28 superset_production.druiddatasource_user.sql.gz
-rw-r--r--  1 dump dump 2.1K Nov 16 14:28 superset_production.favstar.sql.gz
-rw-r--r--  1 dump dump  933 Nov 16 14:28 superset_production.keyvalue.sql.gz
-rw-r--r--  1 dump dump  50M Nov 16 14:29 superset_production.logs.sql.gz
-rw-r--r--  1 dump dump 6.8K Nov 16 14:28 superset_production.metrics.sql.gz
-rw-r--r--  1 dump dump 2.1M Nov 16 14:28 superset_production.query.sql.gz
-rw-r--r--  1 dump dump  55K Nov 16 14:28 superset_production.saved_query.sql.gz
-rw-r--r--  1 dump dump 319K Nov 16 14:28 superset_production.slices.sql.gz
-rw-r--r--  1 dump dump 8.3K Nov 16 14:28 superset_production.slice_user.sql.gz
-rw-r--r--  1 dump dump 4.5K Nov 16 14:28 superset_production.sqlatable_user.sql.gz
-rw-r--r--  1 dump dump  30K Nov 16 14:28 superset_production.sql_metrics.sql.gz
-rw-r--r--  1 dump dump 108K Nov 16 14:28 superset_production.table_columns.sql.gz
-rw-r--r--  1 dump dump  89K Nov 16 14:28 superset_production.tables.sql.gz
-rw-r--r--  1 dump dump  12K Nov 16 14:28 superset_production.tagged_object.sql.gz
-rw-r--r--  1 dump dump 1.3K Nov 16 14:28 superset_production.tag.sql.gz
-rw-r--r--  1 dump dump 108K Nov 16 14:28 superset_production.url.sql.gz

Size seems within that of previous backups (otherwise the size check alert would have complained):

root@db1159.eqiad.wmnet[dbbackups]> select start_date, total_size FROM backups where section='analytics_meta' and status = 'finished' order by id desc limit 20;
+---------------------+------------+
| start_date          | total_size |
+---------------------+------------+
| 2021-11-16 14:28:20 | 1082469038 |
| 2021-11-09 03:59:52 | 1078667813 |
| 2021-11-02 04:07:41 | 1074080221 |
| 2021-10-26 04:05:08 | 1070485973 |
| 2021-10-19 04:05:27 | 1067433672 |
| 2021-10-12 04:04:41 | 1062654019 |
| 2021-10-05 04:17:40 | 1061479475 |
| 2021-09-28 04:00:24 | 1056655658 |
| 2021-09-21 04:08:15 | 1053917052 |
| 2021-09-14 04:08:28 | 1051212957 |
| 2021-09-07 04:07:03 | 1046178626 |
| 2021-08-31 04:07:57 | 1040581760 |
| 2021-08-24 04:09:46 | 1033076010 |
| 2021-08-17 04:07:41 | 1053024804 |
| 2021-08-10 04:03:03 | 1056252857 |
| 2021-08-03 03:30:54 | 1071095439 |
| 2021-07-27 03:19:53 | 1065927619 |
| 2021-07-20 05:06:01 | 1058817413 |
| 2021-07-13 04:59:49 | 1051655906 |
| 2021-07-06 05:02:44 | 1045863738 |
+---------------------+------------+
20 rows in set (0.004 sec)

Ah, fabulous, thanks @jcrespo.

I had found some grants for the dump user defined here: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/templates/mariadb/grants/dumps-eqiad-analytics_meta.sql.erb but I couldn't work out if they were applied anywhere automatically, or if not, why not.

The question I was just about to ask was this...

  • Is it normal our practice to a) create this user directly in the mysql table on a backup replica, or b) keep the mysql.user table fully in sync across all replicas?

...but I see from your answer that it's a) because you said:

we normally don't add the dump user to the primary to prevent accidentally taking backups from it in the first place.

That makes perfect sense now.

Grant management and checking is a pending task we have to solve, but it is not easy for all use cases, in a safe and reliable way for production. Short term we were satisfied by knowing they are documented on puppet, until a more automatic setup is figured out.

Pausing this task since the database migration has been de-prioritized in favour of other, more pressing tasks.
We still want to do it and hardware is now available, so we will return to it as soon as practicable.

Change 736019 abandoned by Ottomata:

[operations/puppet@production] Add role::analytics_cluster::database::meta on an-db100[12]

Reason:

Done differently

https://gerrit.wikimedia.org/r/736019

BTullis renamed this task from Refactor analytics-meta MariaDB layout to use an-db100[12] to Refactor analytics-meta MariaDB layout to use an-mariadb100[12].Thu, Mar 2, 12:52 PM
JArguello-WMF raised the priority of this task from Medium to High.Tue, Mar 14, 5:53 PM
JArguello-WMF set the point value for this task to 3.