This task lists the concrete steps and functionalities we need / expect from the (future) RESTBase deployment system.
= Deployment Process =
All of the actions are to be carried out from a deployment host (currently `tin`). Unlike the current Trebuchet-powered deploy, we do not need the deployment host to be a proxy serving the hosts, they may get their code directly from git/gerrit.
The deployment process can follow two different (but similar) paths:
- regular code deploy
- deployment involving schema changes
== Regular Code Deploy ==
Most of the time the code to be deployed represents logic improvements and feature additions and as such do not bear any impact on the underlying storage (Cassandra). Here are the needed steps to complete a successful deploy, to be executed in sequence on each host:
# depool host
# stop RESTBase
# send / fetch the code
# (re)start RESTBase
# wait for it to bind to its port
# checks / tests
## `curl` some known endpoints
## check the logs and graphite for anomalies after restart
### error- and fatal-level log entries
### Cassandra connection issues
### 5xx request response rates for the given host
# repeat all steps for the next host
===== Abort Mechanism =====
For regular code deploys, aborting is pretty straightforward. The deployment system should keep track of the repository state beforehash/tag being currently deployed (without in any way interfering with the deploymentrepository itself). Should a deploy fail, it simply enforces the previously-known-to-work code tag/branch/hash on all of the hosts sequentially.
== Deploying Schema Changes ==
Some new features and/or betterment of existing functionalities require a new storage schema to be applied. The schemas are versioned and are applied by RESTBase on start-up. As this is a rather sensitive Cassandra operation, extra measures and precautions need to be taken.
The general workflow follows the aforementioned one, but two steps on all hosts need to be executed before:
- disable Puppet agent runs
- disable RESTBase worker restarts
=== Abort Mechanism ===
Because schemas are versioned, the back-end storage will refuse to apply a schema with a version lower than the currently-present one. This means that in this instance the abort mechanism involves an additional step - a manual commit from the deployer bumping the schema version number of the last stable schema.
= Configuration Management =
RESTBase's configuration is made up of several parts: ops, code-specific and schema. //Ops// refers to the part of the configuration controlled by TechOps. This includes mainly host-specific configuration directives such as the list of Cassandra nodes, Parsoid and other *oids' host names. //Code-specific// configuration includes simple, RESTBase-specific details, such as the pagination size or the list of active modules. Finally, //schema// configuration directives are those whose changes are reflected in the underlying storage, most notably the addition of new back-ends which need storage and new domains.
Currently, the configuration file is managed as a whole in `ops/puppet`. ThisWhile this works well for //ops// and //code-specific// configuration bits, butit is highly inadequate for //schema// ones.
Instead, whenever one such configuration directive changes (or is added), the deployment process described earlier **must be used**. Hence, we require configuration changes to be regarded as code deploys, the deployment process described earlier for schema changes must be usedwith rolling deploys and rollback possibilities.
= Beta / Staging =
Ideally, the instances in the Beta Cluster would always be up-to-date with the newest development changes. Since these can include schema changes as well, the upgrade process should resemble the schema changes deployment process.
The staging environment should contain a tagged/branched version which is to be tested. The deployment process is to follow the ones described above. This could also be the time/place for the system to record the built node module dependencies, so that we can get rid of the superfluous //deploy// repository (or, at least, hide it from the user). Currently, we use [a script](https://github.com/wikimedia/service-template-node/blob/master/doc/deployment.md) to bring it up to date. Furthermore, while not needed, the deployment system to record which type of deployment was chosen by the deployer so that it can later be used when deploying in production.
What we would explicitly need is an automated way of creating / updating the dependencies on each commit, so that the //deploy// repository can be put out of use. Concretely, each commit can be associated with a //dependency artifact// which would be deployed together with the source RESTBase code for a given hash/tag.
NOTE: Each environment will need to have its own configuration file
NOTE: We currently already have a small staging cluster where we test changes to RESTBase and Cassandra, however, the staging environment evoked here refers to the new overall staging infrastructure
= See also =
- [Service deployment workflow](https://wikitech.wikimedia.org/wiki/User:Mobrovac/Service_Deployment)
- [Current solution for RESTBase deployment](https://wikitech.wikimedia.org/wiki/RESTBase)
- General service deploy system requirements: T93428