✩ New Wiki Replica servers ready for use

New Wiki Replica servers ready for use

Bigger, faster, better replication technology

TL;DR

Change your tools and scripts to use:
- *.web.db.svc.eqiad.wmflabs (real-time response needed)
- *.analytics.db.svc.eqiad.wmflabs (batch jobs; long queries)
Replace * with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki).
The new servers do not support user created databases/tables because replication can't be guaranteed. See T156869 and below for more information. tools.db.svc.eqiad.wmflabs (also known as tools.labsdb) will continue to support user created databases and tables.
Report any bugs you find with these servers in Phabricator using the Data-Services tag.

Wiki Replicas

The Wiki Replicas are one of the unique services that Wikimedia Cloud Services helps make available to our communities. Wiki Replicas are real-time replicas of the production Wikimedia MediaWiki wiki databases with privacy-sensitive data removed. These databases hold copies of all the metadata about content and interactions on the wikis. You can read more about these databases on Wikitech if you are unfamiliar with their details.

The current physical servers for the <wiki>_p Wiki Replica databases are at the end of their useful life. Work started over a year ago on a project involving the DBA team and cloud-services-team to replace these aging servers (T140788). Besides being five years old, the current servers have other issues that the DBA team took this opportunity to fix:

Data drift from production (T138967)
No way to give different levels of service for realtime applications vs analytics queries
No automatic failover to another server when one failed

Bigger, faster, more available

Each of the three new servers is much larger and faster than the servers they are replacing. Five years is a very long time in the world of computer hardware:

2 x 4 core processors (16 virtual cores total)
512 GB of RAM (2.5 times the old amount)
12 TB of SSD disks (vs 5 TB of fixed disks previously)

We have also upgraded the database software itself. The new servers are running MariaDB version 10.1. Among other improvements, this newer database software allows us to use a permissions system that is simpler and more secure for managing the large number of individual tools that are granted access to the databases.

The new servers use InnoDB tables rather than the previous TokuDB storage. TokuDB was used on the old servers as a space-saving measure, but it has also had bugs in the past that caused delays to replication. InnoDB is used widely in the Wikimedia production databases without these problems.

The new cluster is configured with automatic load balancing and failover using HAProxy. All three hosts have identical data. Currently, two of the hosts are actively accepting connections and processing queries. The third is a ready replacement for either of the others in case of unexpected failure or when we need to do maintenance on the servers themselves. As we learn more about usage and utilization on these new hosts we can change things to better support the workloads that are actually being generated. This may include setting up different query duration limits or pooling the third server to support some of the load. The main point is that the new system provides us with the ability to make these types of changes which were not possible previously.

Improved replication

The work of scrubbing private data is done on a set of servers that we call "sanitarium" hosts. The sanitarium servers receive data from the production primary servers. They then in turn act as the primary servers which are replicated to the Wiki Replica cluster. The two sanitarium servers for the new Wiki Replica cluster use row-based replication (RBR). @Marostegui explains the importance of this change and its relationship to T138967: Labs database replica drift:

[W]e are ensuring that whatever comes to those hosts (which are, in some cases, normal production slaves) is exactly what is being replicated to the [Cloud] servers. Preventing us from data drifts, as any data drift on row based replication would break replication on the [Cloud] servers. Which is bad, because they get replication broken, but at the same time is good, because it is a heads up that the data isn't exactly as we have it in core. Which allows us to maintain a sanitized and healthy dataset, avoiding all the issues we have had in the past.

The data replicated to the new servers has been completely rebuilt from scratch using the RBR method. This has fixed many replication drift problems that exist on the older servers (T138967). If your tool performs tasks where data accuracy is important (counting edits, checking if a page has been deleted, etc), you should switch to using the new servers as soon as possible.

New service names

Populating the new sanitarium servers with data was a long process (T153743), but now that it is done our three new Wiki Replica servers are ready for use. With the old setup, we asked people to use a unique hostname with each database they connected to (e.g. enwiki.labsdb). The new cluster extends this by adding using service names to separate usage by the type of queries that are being run:

Use *.web.db.svc.eqiad.wmflabs for webservices and other tools that need to make small queries and get responses quickly.
Use *.analytics.db.svc.eqiad.wmflabs for longer running queries that can be slower.

If you were using enwiki.labsdb you should switch to either enwiki.analytics.db.svc.eqiad.wmflabs or enwiki.web.db.svc.eqiad.wmflabs. The choice of "analytics" or "web" depends on what your tool is doing, but a good rule of thumb is that any query that routinely takes more than 10 seconds to run should probably use the "analytics" service.

Right now there is no actual difference between connecting to the "web" or "analytics" service names. As these servers get more usage and we understand the limitations they have this may change. Having a way for a user to explicitly choose between real-time responses and slower responses for more complicated queries will provide more flexibility in tuning the systems. We expect to be able to allow queries to run for a much longer time on the new analytics service names than we can on the current servers. This in turn should help people who have been struggling to gather the data needed for complex reports within the current per-request timeout limits.

A breaking change

These new servers will not allow users to create their own databases/tables co-located with the replicated content. This was a feature of the older database servers that some tools used to improve performance by making intermediate tables that could then be JOINed to other tables to produce certain results.

We looked for solutions that would allow us to replicate user created data across the three servers, but we could not come up with a solution that would guarantee success. The user created tables on the current servers are not backed up or replicated and have always carried the disclaimer that these tables may disappear at any time. With the improvements in our ability to fail over and rebalance traffic under load, it is more likely on the new cluster that these tables would randomly appear and disappear from the point of view of a given user. This kind of disruption will break tools if we allow it. It seems a safer solution for everyone to disallow the former functionality.

User created databases and tables are still supported on the tools.db.svc.eqiad.wmflabs server (also known as tools.labsdb). If you are using tables co-located on the current c1.labsdb or c3.labsdb hosts we are recommending that your tool/scripts be updated to instead keep all user managed data on tools.db.svc.eqiad.wmflabs and perform any joining of replica data and user created data in application space rather than with cross-database joins.

There will be further announcements before the old servers are completely taken offline, but tools maintainers are urged to make changes as soon as they can. The hardware for the older servers is very old and may fail in a non-recoverable way unexpectedly (T126942).

Curated datasets

There are some datasets produced by ORES, the Analytics team, or volunteers that really do need to be co-located with the wiki tables to be useful. We are looking for a solution for these datasets that will allow them to be replicated properly and available everywhere. See T173511: Implement technical details and process for "datasets_p" on wikireplica hosts for further discussion of providing some method for 'curated' datasets to be added to the new cluster.

Quarry will be switched over to use *.analytics.db.svc.eqiad.wmflabs soon. As noted previously, using the analytics service names should allow more complex queries to complete which will be a big benefit for Quarry's users who are doing analytics work. This change may however temporarily interrupt usage of some datasets that are blocked by T173511. Follow that task for more information if your work is affected.

You can help test the new servers

Before we make the new servers the default for everyone, we would like some early adopters to use them and help us find issues like:

missing permissions
missing views
missing wikidb service names

To use them, change your tool to connect to:

*.web.db.svc.eqiad.wmflabs (real-time response needed)
*.analytics.db.svc.eqiad.wmflabs (batch jobs; long queries)

Replace the * with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki).

For interactive queries, use one of:

sql --cluster analytics <database_name>
mysql --defaults-file=$HOME/replica.my.cnf -h <wikidb>.analytics.db.svc.eqiad.wmflabs <database_name>

Report any bugs you find with these servers and their data in Phabricator using the Data-Services tag.

Thanks

The cloud-services-team would like to especially thank @jcrespo and @Marostegui for the work that they put into designing and implementing this new cluster. Without their technical experience and time this project would never have been successful.

Learn more

Tool creation added to toolsadmin.wikimedia.org

Introducing the Cloud Services Team: What we do, and how we can help you

Written by bd808 on Sep 25 2017, 11:43 PM.

Principal Software Engineer

Projects

Subscribers

Superzerocool, Base, jcrespo, • Marostegui

Event Timeline

The current versions of MariaDB seem to be 10.2.*. What is the reason why 10.1 is selected as opposed to the current versions?

In J70#762, @Base wrote:

The current versions of MariaDB seem to be 10.2.*. What is the reason why 10.1 is selected as opposed to the current versions?

MariaDB 10.2 GA was released at the end of May 2017, and this project was started around Oct 2016, so it wasn't available :-)

MariaDB 10.1 GA was released Oct 2015, so that means it has been used, tested and bugs have been reported for over a year when we started using it. That means it is a well polished version.

Normally we do not like the idea of going with the latest version available, as normally new versions might contain unexpected bugs and needs some more minor versions to be totally stable.

When a new version is released, you can always have regressions, unexpected bugs, specially on big environments with lots of concurrency and lots of different kind of queries. To be on the safe side, we prefer to go for versions that have been out there for a while, so people could have done regression tests, bugs have been reported and fixed and there is a general consensus that the version is stable.

It could be interesting to add that RAID10 is used, increasing the performance and reliability of the setup (instead of the comples, highly unreliable setup before).

In J70#764, @jcrespo wrote:

It could be interesting to add that RAID10 is used

I knew I was forgetting something! RAID 10 (sometimes also written as RAID 1+0) means that each drive is "mirrored" to an identical drive (RAID 1) and each file that is stored is "striped" across multiple drives (RAID 0). This gives a couple of benefits. The striping helps with Input/output operations per second (IOPS) performance, functionally making the group of disks look like one faster disk. Mirroring helps with availability. The collective system can sustain multiple drive failures and continue to store and retrieve data as long as no single mirrored pair loses both drives.

I have updated this blog post to include a mention of the support that is now in the sql helper command on Toolforge for selecting a database cluster to connect to.

$ sql --help
usage: sql [-h] [-v] [--cluster {analytics,web,labsdb}] DATABASE ...

Connect to or run SQL query on replica or Tools database server

positional arguments:
  DATABASE              for example commonswiki_p or enwiki
  ...                   SQL query; multiple arguments will be joined by spaces

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase logging verbosity
  --cluster {analytics,web,labsdb}
                        cluster to connect to

Report bugs to Phabricator: https://phabricator.wikimedia.org