With the scalability issues we've been seeing on php-fpm when a lot of higher-latency http calls are involved, the necessity of having a proxy that can handle connections between services has become apparent.
More in general, we want to have a middleware that allows us to generically have the following capabilities, when dealing with RPC calls to other services:
* Allow connection pooling
* Work well with our DNS discovery mechanism
* Enable TLS e2e without the need for relying on every single service doing encryption the "right" way
* Allow configuring per-endpoint timeouts.
* Global and local-only rate limiting
* Allow monitoring RPC calls (telemetry and tracing)
* Tracing of RPC calls
We've evaluated nginx in the past, and the non-commercial version lacks in even the most important of these features, as it can either support dns discovery or connection pooling, not both. We already use envoy as a TLS terminator on most servers, so we can probably use it to implement such a middleware, which is also what envoy was designed for.
We need to do what follows, for each service:
- Add TLS termination
- Add service proxy support
once that's done across all services, we can move, for each of them, through the following steps:
- Add a TLS LVS endpoint
- Switch the service proxy to use the TLS endpoint
- Remove the HTTP LVS endpoint
Here is the current situation across the board:
| service | tls termination | service proxy | TLS LVS | cleanup http LVS (optional)
| mediawiki | x | x | x |
| restbase | x | x | x |
| ores | x | x | |
| [[ https://phabricator.wikimedia.org/T236017 | blubberoid ]] | x | - | x | |
| [[ https://phabricator.wikimedia.org/T255868 | citoid ]] | x | | | |
| echostore | x | - | x | x
| sessionstore | x | - | x | x
| [[ https://phabricator.wikimedia.org/T254581 | termbox ]] | x | x | x | |
| [[ https://phabricator.wikimedia.org/T256973 | push-notifications ]] | x | | x | -
| [[ https://phabricator.wikimedia.org/T255876 | mobileapps ]] | x | | | |
| [[ https://phabricator.wikimedia.org/T255879 | cxserver ]] | x | |x| |
| [[ https://phabricator.wikimedia.org/T255870 | eventgate-analytics ]] | x | - | x | |
| [[ https://phabricator.wikimedia.org/T255871 | eventgate-analytics-external ]] | x | - | x | x |
| [[ https://phabricator.wikimedia.org/T255872 | eventgate-logging-external ]] | x | - | x | x |
| [[ https://phabricator.wikimedia.org/T255873 | eventgate-main ]] | x | - | x | |
| [[ https://phabricator.wikimedia.org/T255874 | eventstreams ]] | x | - | x | x |
| [[ https://phabricator.wikimedia.org/T255875 | mathoid ]] | x | | | |
| [[ https://phabricator.wikimedia.org/T255877 | proton ]] | x | | x|
| [[ https://phabricator.wikimedia.org/T255878 | wikifeeds ]] | x | | | |
| [[ https://phabricator.wikimedia.org/T255869 | zotero ]] | x |-| | |