T122375 calls out the need to segment and minimize access to sensitive data within the WMF cluster. One important subset of this is user-specific information:
- Password hashes: {T120484}
- Sessions: {T134811}, {T137272}
- Checkuser information, especially IP addresses
Since all of these need similar protection, it seems to make sense to handle them in a single, firewalled "UserInfo" service.
## Requirements
- **Protect sensitive data**
- Firewall protection for both the service & its backend storage.
- Even with remote execution on the client, there is no way to perform random queries (ex: list sessions) on the underlying storage system.
- Narrow API exposing only what is absolutely needed.
- Clients connecting & authenticating via TLS.
- **Reliability**: No SPOFs, operationally as simple as possible.
- **Multi DC support**: Solid multi-master & fail-over support. Availability more important than transactional cross-DC consistency. Should gracefully handle network partitions.
- **Horizontally scalable**, especially for session storage:
- 9k reads/s
- ~100 writes/s
- total session size: ~1.4G uncompressed, max 9G
- Ideally, support for TTLs & garbage collection in storage backend. If not supported natively, this can be implemented manually in background task.
## Options and trade-offs
### Service
Our default choice for a new services with fairly normal requirements like this is to leverage our node.js ecosystem. Performance is unlikely to be an issue. Crypto / TLS support is available via node's native OpenSSL integration.
### Storage
The two default candidates for backend storage are MySQL and Cassandra. The query needs are primarily simple key/value storage, ideally with TTL support / garbage collection.
Replication lag should be as short as possible.
#### MySQL
- + 13+ years of experience, solid performer.
- - No automatic horizontal sharding.
- +- Multi-DC / multi-master design trade-offs not a good fit for use case.
- Galera offers synchronous master/master replication, which can't provide high availability and low latency when used across less reliable WAN links between DCs. Temporary partitions [lead to outages on one side](https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/) of the partition.
- Master-slave replication complex to manage across DCs, and does not support session timestamp updates in both DCs.
- (-) No built-in garbage collection (but relatively easy to automate).
- +- Replication lag on the order of seconds (master-slave).
#### Cassandra
- +- ~2 years of experience.
- + Automatic horizontal sharding.
- + Mature multi-master support, last write wins reconciliation after partitions.
- (+) Built-in TTL / garbage collection support.
- (-) Does not scale too well to > 1T per instance, but data set much smaller (less than 10G).
- + Very low replication lag (parallel writes).
Overall, we are leaning towards using Cassandra. The primary reason for this is better support for multi-DC operation and sharding, combined with very moderate data sizes.
## Hardware needs
We prototyped a [simple session storage backend](https://github.com/gwicke/authoid), which supports about 3k reads/s on a dual-core laptop, using a Cassandra backend. Based on this performance data, we roughly estimate that we should be able to comfortably handle the expected read load with three nodes per DC.
## Timeline and division of labor
### Session storage
Session storage is technically simpler than authentication. The timeline depends primarily on making decisions on the general approach, determining hardware needs & procuring / installing the needed hardware. If we decide to continue with the prototype service / MediaWik integration & Cassandra storage by early August, then it might be possible to be ready for a gradual roll-out by the end of Q1. That said, we consciously did not set a hard deadline, and do not intend to rush it.
### Authentication
In cooperation with #security, we are planning to prototype the auth service in Q1. Division of labor is planned as follows:
- #services will own the service, storage and API.
- #security will implement a crypto library for handling MediaWiki password hash schemes to be used by the service.
In Q2, #security and #services intend to work with #reading-infrastructure-team on integrating the auth service as a CentralAuth backend, and gradually rolling it out to production. At this point, we'll need production hardware.
## See also
- [Design notes](https://docs.google.com/document/d/1-sNsDhJl1KCqOX9De5uTwFypnBaNOB_XzKo9Al8l4bI/edit) from discussions between #security & #services, focused on the authentication service.