Page MenuHomePhabricator

Detect object, schema and data drifts between mediawiki HEAD, production masters and replicas
Closed, ResolvedPublic

Description

While replica drift is not a concern, there is no account of schema changes on different hosts due to performance reasons, such as different partitioning system, or usage of the TokuDB engine instead of InnoDB.

Create a script or application to check for unaccounted object differences (tables, triggers, dbs), schema changes in tables and data differences.

While there are open source solutions for that, we need to accommodate those tools to our particular setup:

  • Sharding
  • Heavy use of filters
  • sanitized host with some of the data missing
  • Not yet applied or undocumented schema changes (good changes applied to production only but not to mediawiki; changes applied to mediawiki but never scheduled for production; etc.)

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
Resolvedjcrespo
OpenNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedSmalyshev
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
OpenNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
StalledKormat
OpenNone
StalledKormat
ResolvedMarostegui
StalledKormat

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 299006 abandoned by ArielGlenn:
script to generate lists of db hosts by shard and/or dc

Reason:
obsolete, salt-based anyways

https://gerrit.wikimedia.org/r/299006

Today I reworked my original code to be more autonomous and specially it would gather the data on its own also you can now run it on the whole fleet. I worked on improving its reporting as well, it now reports grouped by drift type and another grouping based on the database so we can attack the problem in different angles. the code, the results grouped by drift type and the results grouped by database.

One note: The code at its current shape, picks one wiki random from the section, it shouldn't matter right now, once we clean up the big bits, we can get more aggressive in checks, right now it would overwhelm the whole report.

I would be more than happy to move the code to gerrit as part of operations/software. Does that sound good? Our amazing DBAs: How do you want to run it? Currently you can have the code in mwmaint1002 and then run it like this:

python3 new_db_checker.py core "sql {wiki} -- " all --important-only -prod

(Checks core drifts across the fleet)
or

python3 new_db_checker.py wikibase-repo "sql {wiki} -- " s8 --important-only -prod

(Checks Wikibase-Repo drifts on s8)

Thanks for this Amir - this is super useful feel free to move it to operations/software, better to have it there just in case.
As I mentioned on my email to Operations, let's try to get in a better shape first and fix the already created tickets, before automating it and sending reports that won't have much change over the weeks (as it will take time to fix all the drifts) and we know what happens when there's a weekly (or monthly) email that has no change...people tend to ignore it :-(

I would suggest we start working on fixing the drifts first and once we are in a much cleaner state, we can talk about scheduling and sending a report every once in XX days.

Again, thank you so much!

I just made a dashboard to keep track of drifts reported by my tool: https://drift-tracker.toolforge.org/

LSobanski renamed this task from Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves to Detect object, schema and data drifts between mediawiki HEAD, production masters and replicas.May 14 2021, 10:59 AM
LSobanski updated the task description. (Show Details)

I updated the task to limit it to drift detection and spun automation into a separate task: T282857. These are effectively two separate tasks of different complexity and priority.

Ladsgroup claimed this task.

I just rewrote the dashboard in python to make it more maintainable and created Tool-drift-tracker (and created some tickets to tackle later, specially some tickets to automate the work much more.)

With https://drift-tracker.toolforge.org/report/core/

We can basically call this done ^^. Of course there are much more to be done but I consider this a good start.

A feature request for data differences (narrowly scoped) is tracked in https://phabricator.wikimedia.org/T207253.