Page MenuHomePhabricator

Plan/design a session storage service
Closed, ResolvedPublic

Description

Once questions about the external interface have been answered, we'll need to plan the remainder of the implementation, including the use of technologies, and operational semantics.

Mediawiki Integration

Mediawiki supports plugging of session persistence using a BagOStuff implementation, and there already exists a RESTBagOStuff, (apparently created for this purpose). The only issue with RESTBagOStuff is that it uses PHP's serialize() and unserialize() for the body of PUT and the response of GET respectively, and we have specified JSON. Options include: Updating the existing implementation as a breaking change, update the existing implementation to support optional (configurable) JSON encoding, or creating another implementation based on RESTBagOStuff.

Replication semantics

Based on the requirements for session storage, we should be able to assume in all cases that GET and PUT use Cassandra's ConsistencyLevel.LOCAL_QUORUM and that DELETE uses ConsistencyLevel.EACH_QUORUM.

NOTE: The software created to implement this will be a very straightforward key-value implementation, likely applicable to other use-cases in the future, not all of which will necessarily be satisfied by these semantics. However, rather than generalize this now (either through configuration, or per-request parameters), we will define these as constants, and revisit when/if a future use-cases arise.

Technology

The proposed system is simple enough that it could easily be created using any number of languages and/or frameworks. Given the common choices for similar or related projects at the WMF, we can probably narrow the choices to Javascript/NodeJS, Python, and PHP.

Javascript/Nodejs

The raison d'etre for this service is session storage, so security is paramount. However, with NodeJS, the only practical source of dependencies is http://npmjs.org. Dependencies, both those explicitly declared, as well as those that are transitive, are fetched whenever npm install is invoked, and there is no chain of trust. These dependencies -- the entire contents of node_modules/ -- are as much a part of our production applications as any that we write, yet despite the time, care, and effort we put into reviewing even the smallest of changes to our code bases, the contents of node_modules/ remain opaque to us.

A saner approach for something so security critical, would be to prioritize a manageable list of dependencies that can be sourced entirely from within our current version of Debian (Stretch).

PHP

If written in PHP, we would need to (at a minimum), come up with a solution for the Cassandra driver. There is a driver in Debian unstable (unstable only), but it is broken. We would need fix the build/packaging, allow it to transition to Debian testing, and then upload a backport (assuming the maintainer is amenable).

Python

There seems to be more of a precedent for building software like this in Python, than there is for PHP, here at the WMF. Additionally, a Cassandra driver is packaged for Debian Stretch, as are most (all?) of the common frameworks, a prometheus client, and several high-performance production ready WSGI containers.

Of the popular frameworks in Debian: Django seems a bit heavy/excessive for a service this simple. Flask is much simpler/lighter, yet seems to have the abstractions that would matter to us (logging, configuration, JSON encoding/decoding, etc), and is quite popular. It also helps that this would not be the first use at the WMF.

Others

One concern here is performance, particularly request latency. Session storage latency makes up a part of the overall latency of every authenticated request. Given that we currently use Redis (highly optimized, in-memory), adding persistence and replication will only increase this latency. This seems to be understood, and that the benefits are worth incurring some additional latency for, but reasonable care should be taken to minimize this.

As mentioned before, precedent, and a desire to avoid proliferation mean that PHP, Python, and Javascript are the uncontroversial choices. Depending on use-case, there are compelling reasons for using any of these, but performance isn't typically among them. Some alternatives that fall outside are typical choices are:

C/C++

C/C++ are excellent choices when performance is a concern. However, they can be difficult to work with (particularly with respect to concurrency and memory management), and developer expertise is somewhat scarce (both inside and outside the foundation).

Short of throwing the doors wide to external dependencies, I imagine us implementing our own HTTP server from boost libraries, and externally sourcing the Cassandra driver (a C++ driver does not ship with Debian). Implementing session storage in C/C++ would likely increase development time and maintenance overhead significantly.

Rust

Rust is another excellent choice from a performance point of view. Unlike C++, Rust has built memory and concurrency safety into the language/compiler, all but eliminating the biggest source of bugs in C/C++ applications. However, language expertise is at least as scarce, and despite a promising start it is too new to say its future is certain.

Were we to implement session storage in Rust, we'd have to openly embrace the use of vendored, (unreleased) external dependencies. We'd also need to do due diligence to ensure a suitable Cassandra driver. Implementing session storage in Rust would likely increase development time and maintenance overhead significantly.

Go

Unlike C/C++ and Rust, Go obtains memory safety through the use of a garbage collector. It's performance isn't on par with what is possible from C++ and Rust, but it is quite good (when compared to our current stable of languages). Developer expertise is on the rise (inside and outside of the foundation), and enthusiasm to learn it seems to run high.

Were we to implement session storage in Go, it would be possible to source dependencies entirely from within Debian, as was proposed with Python above (though these would become build dependencies, not runtime). Implementing session storage in Go would likely not significantly increase development time and maintenance overhead.

Java

Like Go, Java provides memory safety through the use of a garbage collector. Performance isn't on par with that of C++ or Rust, but the JVM is state-of-the-art and highly optimized. Historically there has been reluctance to using Java within the WMF, but expertise does exist.

Were we to implement session storage in Java, we would need to source the driver externally, (all other dependencies can be satisfied in Debian). Implementing session storage in Java would likely not significantly increase development time and maintenance overhead.

Unless we can confidently determine that we're OK with the performance penalty of Python, my (@Eevans) preference would to implement this service in Go using dependencies sourced from Debian Stretch. Input from SRE on this would however be appreciated (/cc @faidon, @Joe, @MoritzMuehlenhoff ).

Authentication, authorization, encryption

See: T209109: Security model for session storage service

Logging

Log messages will be JSON encoded and delivered to syslog.

Metrics

Prometheus metrics

metric nametypedescription
Read misses (invalid or expired keys)
Reads (successful)
Sets
Deletes
Errors
Read latency
Set latency
Delete latency
All of the above are covered by the standard Prometheus instrumentation which records counts and latency for each endpoint by HTTP status.

Health check

See: T209108: Monitoring and data collection for session storage service

Event Timeline

Eevans created this task.Oct 2 2018, 7:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 2 2018, 7:17 PM
Eevans triaged this task as Medium priority.Oct 2 2018, 7:34 PM
Eevans moved this task from Backlog to In-Progress on the User-Eevans board.
Eevans added a subscriber: CCicalese_WMF.
Eevans updated the task description. (Show Details)Oct 23 2018, 8:53 PM
Eevans added a subscriber: Clarakosi.
Eevans updated the task description. (Show Details)Oct 24 2018, 7:52 PM
Eevans updated the task description. (Show Details)Oct 24 2018, 8:44 PM
Eevans updated the task description. (Show Details)Oct 25 2018, 1:02 PM
Eevans updated the task description. (Show Details)Oct 30 2018, 7:45 PM
Eevans updated the task description. (Show Details)Oct 30 2018, 8:42 PM
Eevans updated the task description. (Show Details)Oct 30 2018, 8:45 PM
Eevans updated the task description. (Show Details)Oct 30 2018, 8:56 PM
Eevans updated the task description. (Show Details)Nov 5 2018, 3:41 PM
Clarakosi updated the task description. (Show Details)Nov 6 2018, 3:19 PM
Eevans renamed this task from Session storage service planning/design to Session storage service: Planning and design.Nov 7 2018, 3:19 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 3:25 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 3:41 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 5:10 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 5:19 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 5:32 PM
Eevans updated the task description. (Show Details)Nov 7 2018, 11:01 PM
Eevans added subscribers: MoritzMuehlenhoff, faidon, Joe.
Eevans updated the task description. (Show Details)Nov 7 2018, 11:09 PM
CDanis added a subscriber: CDanis.Nov 8 2018, 3:24 PM
Eevans updated the task description. (Show Details)Nov 8 2018, 3:43 PM
jijiki added a subscriber: jijiki.Nov 8 2018, 4:59 PM
Eevans updated the task description. (Show Details)Nov 8 2018, 7:50 PM
Eevans renamed this task from Session storage service: Planning and design to Plan/design a session storage service.Dec 13 2018, 5:18 PM
Eevans updated the task description. (Show Details)Dec 20 2018, 5:49 PM
Eevans updated the task description. (Show Details)Dec 21 2018, 6:01 PM
Eevans updated the task description. (Show Details)Jan 7 2019, 6:32 PM
Eevans updated the task description. (Show Details)Jan 7 2019, 7:56 PM
Eevans updated the task description. (Show Details)Jan 8 2019, 3:51 PM
Clarakosi updated the task description. (Show Details)Feb 8 2019, 4:13 PM
Eevans updated the task description. (Show Details)Apr 1 2019, 2:23 PM
Eevans closed this task as Resolved.Apr 1 2019, 2:25 PM

We are well past the design at this point (targeting deployment for Q4 2019); Closing as resolved.