analytics-store (aka dbstore1002), which currently hosts the MediaWiki analytics replicas for all wikis, has an unsustainable setup. Specifically:
- It runs Ubuntu Trusty, which will hit its end-of-life at the end of April 2019, so we have a hard deadline to move it to a new Debian Stretch setup.
- It's out of warranty and is showing up some signs of failure (disks in the RAID array breaking, etc..), so it needs to be replaced by a newer one. This means replicating the setup on a new (very beefy) host of course.
- It currently uses multisource replication to pull data from all the production shards onto a single host. From the user's point of view, it works nicely but on the SRE side it creates some challenges due to the big amount of data stored and replication throughput from the "wiki" production slaves. For example, we are currently storing close to 6TB of data on a single host, replicated with RAID with a data insertion rate that keeps growing over time. Simply throwing disks and buying new bigger hardware is not feasible/scalable in the medium/long term, at some point we will get again to an overloaded host that can't keep up with replication. The cost of maintenance (in terms of people working on it) will be high, more than what it is now (the Data Persistence in SRE is doing an incredible job behind the scenes on a daily basis for dbstore1002).
- The data it currently contains is unreliable and not equal to production, however because the setup is so different from production it is not easily fixable
- It mainly uses TokuDB, which has in the past be unreliable, lagging days behind and crashing/returning inconsistent results.
We are planning to migrate dbstore1002 to three new hosts: dbstore1003/4/5. The idea is to split the wiki replicas to multiple hosts, and finally deprecate dbstore1002 (we have a hard deadline of 30 April 2019 since Ubuntu Trusty will be EOLed).
The data persistence team proposed a layout in T210478#4794536. It would move sX sections (so the database groupings listed in s1.dblist, s2.dblist etc..) to their own mysql instance on an assigned dbstore node. For example, all wikis in S5 will be available (i.e. replicated) to a mysql instance on dbstore1003 (with an assigned port that we don't know yet). So joins between schemas belonging to different sX sections will not be possible anymore.
The staging database will likely be assigned to a separate mysql instance, so people will be able to keep using its data. It will still be possible to create tables etc.., but importing data from various wiki databases will need some extra work (dump the data, import it, etc..).