Deploy the RESTBase front-end service (RESTRouter) to Kubernetes
Closed, DeclinedPublic0 Estimated Story Points
Actions

Description

We are splitting RESTBase in two components - the (public) REST API router and the storage service (cf. T220449: Split RESTBase in two services: storage service and API router/proxy). This task is about deploying the front-end REST router in Kubernetes.

Service Info

Service name: RESTRouter (name still under discussion, cf. T220761)
Owners: @Pchelolo and @mobrovac (Platform Engineering)
Repository: mediawiki/services/restbase
ETA: by the end of Q4 FY18/19
Description: RESTRouter is the routing part of (the current) RESTBase. It accepts external requests, validates them (performs access checks if needed) and performs all the business logic related to the request: it looks up the storage for possible data hits and, if needed, issues requests to back-end services to complete the requests, sending the response to storage prior to returning it to the client.

Deployment Plans

Restrouter migration plans. Some parts are the same for all plans. Those are listed below

First, we deploy RESTRouter to k8s.
we expose the storage routes in RESTBase (cf. PR #1103)
test RESTRouter for load (options include synthetic traffic, mirroring, using only background updates/internal requests).

Plan 1

Have restbase listen on both 7231 and 7233 and configure LVS restbase.svc.$::site.wmnet to also use 7233
Instantiate restrouter on a new LVS IP and DNS (restrouter.svc.$::site.wmnet) and have it talk to restbase.svc.$::site.wmnet:7233
Move services 1 by 1 to restrouter.discovery.wmnet (the site aware discovery records for restrouter.svc.$::site.wmnet)

Pros

Move is gradual on a service level. Services are migrated one by one based on their configuration unearthing potential problems one by one
The currently stable and battle tested restbase installation is kept around even while more and more services are moved around
It's rather easy configuration wise, rather easy to do in steps
No downtime for services.

Cons

The migration might take time as when issues arise, but at least blockers will be service specific
There is no gradual traffic switchover. For every service it's a "canary host first", then all or nothing approach. Even the canary host is depending on DC between 13% and 25% of traffic

Plan 2

Have restbase listen on both 7231 and 7233
Add a new LVS IP on the restbase hosts and name it restbase-backend.svc.$::site.wmnet
Configure restrouter to connect to restbase-backend.svc.$::site.wmnet:7233
Add the LVS IP for restbase.svc.$::site.wmnet to kubernetes hosts
Add the kubernetes hosts to LVS for restbase.svc.$::site.wmnet
Slowly migrate the traffic from the current restbase hosts to kubernetes hosts

Pros

The services see 0 changes. Everything happens transparently to them.
The move of traffic is gradual allowing to rollback quickly and easily, as well as pause the migration
No downtime for services

Cons

Rather convoluted configuration wise, with some margin for mistakes
All or nothing approach as far as services go. No way to distinguish between them
The migration might take a long time as when issues arise they will probably be global blockers for all services
Rollbacks are possible, but if issues arise, it's probably going to be a full rollback to the old installation
The ending restbase.svc.$::site.wmnet DNS does not reflect the actual software powering the frontend, aka restrouter possibly leading to future misunderstandings/confusion

Post migration

In the post-deploy clean-up step, we remove public route handling from RESTBase, effectively turning it into the back-end storage service.

Comment from Giuseppe:

I think plan 1 is much simpler. It requires more patches and more attention to not leave anything behind, but it's probably the better plan. Please be mindful that restrouter will need to be terminating SSL as well, like restbase does.
I vote plan 1.

Marko:

my vote goes for plan 1 as well, even though it will probably take longer, it makes it clear to all parties involved that changes are happening; that means that also service owners will be more aware in case of problems so they will be easier to detect
i agree that the end result is better with restrouter.svc than restbase.svc

RESTRouter will effectively take over request handling from RESTBase, so we will need to divert traffic to it without interruption.

Benchmarking:

Details

Subject	Repo	Branch	Lines +/-
restrouter: Allow the kademlia port in ingress	operations/deployment-charts	master	+119 -94
restrouter: Kademlia should listen on all IPs	operations/deployment-charts	master	+115 -91
RESTRouter: Bump image tag to v1.1.2 and release v0.0.7	operations/deployment-charts	master	+116 -93
restrouter: Revert the initialDelay seconds	operations/deployment-charts	master	+0 -6
restrouter: Add ratelimiting support to chart	operations/deployment-charts	master	+27 -8
RESTRouter: Skip resources on start-up and add nqo.wp.org	operations/deployment-charts	master	+2 -0
Activate restrouter discovery records	operations/dns	master	+2 -2
calico: Add port 8000 (parsoid) to restrouter	operations/deployment-charts	master	+6 -0
restrouter: Fix the parsoid port in the configuration	operations/deployment-charts	master	+3 -3
LVS for RESTRouter.	operations/puppet	production	+72 -12
Assign restrouter LVS IPs	operations/dns	master	+6 -0
restrouter: Skip using https for mwapi_uri	operations/deployment-charts	master	+3 -3
restrouter: Skip probes for the first 60 seconds	operations/deployment-charts	master	+6 -0
RESTRouter: Add missing back-end svc URIs	operations/deployment-charts	master	+27 -26
Expose the key_value buckets to production IPs	mediawiki/services/restbase/deploy	master	+10 -0
Release restrouter chart version 0.0.3	operations/deployment-charts	master	+107 -84
RESTRouter: Clean up the config && add the wikifeeds URI	operations/deployment-charts	master	+33 -32
LVS: Setup port 7233 for restbase-backend	operations/puppet	production	+52 -24
RESTBase: Temporarily allow access to port 7233 as well	operations/puppet	production	+5 -1
Expose both ports 7231 and 7233.	mediawiki/services/restbase/deploy	master	+10 -1
restrouter: Add kubernetes stanzas	operations/puppet	production	+49 -0
restrouter: Switch to event_service_uri	operations/deployment-charts	master	+98 -75
restrouter: Add helmfile stanzas	operations/deployment-charts	master	+390 -0
Publish restrouter 0.0.1	operations/deployment-charts	master	+66 -66
RESTRouter: Add initial Helm chart	operations/deployment-charts	master	+1 K -64

Related Objects
Search...

Status	Assigned	Task
Resolved	WDoranWMF	T220449 Split RESTBase in two services: storage service and API router/proxy
Open	None	T198901 Migrate production services to kubernetes using the pipeline
Resolved	akosiaris	T228676 Self-service Deployment Pipeline
Declined	akosiaris	T223953 Deploy the RESTBase front-end service (RESTRouter) to Kubernetes
Resolved	• Pchelolo	T226538 Conduct basic load-test experiments for RESTRouter in k8s
Resolved	• Pchelolo	T226536 Trigger RESTRouter image builds on push/tag
Declined	None	T235437 RESTBase/RESTRouter/service-runner rate limiting plans
Declined	• Pchelolo	T249919 Move service-runner legacy rate limiter into hyperswitch

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

• Pchelolo removed a project: Platform Team Workboards (Clinic Duty Team).Jul 18 2019, 8:23 PM

WDoranWMF moved this task from RESTBase Split (CDP2) to mop on the Platform Engineering board.Jul 26 2019, 6:46 PM

WDoranWMF edited projects, added Platform Team Initiatives (RESTBase Split (CDP2)); removed Platform Engineering (RESTBase Split (CDP2)).

Change 526448 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Assign restrouter LVS IPs

https://gerrit.wikimedia.org/r/526448

Change 526449 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Activate restrouter discovery records

https://gerrit.wikimedia.org/r/526449

Change 526632 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Add kubernetes stanzas

https://gerrit.wikimedia.org/r/526632

Change 526719 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Add helmfile stanzas

https://gerrit.wikimedia.org/r/526719

Change 527130 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Switch to event_service_uri

https://gerrit.wikimedia.org/r/527130

Change 526719 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] restrouter: Add helmfile stanzas

https://gerrit.wikimedia.org/r/526719

Change 527130 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] restrouter: Switch to event_service_uri

https://gerrit.wikimedia.org/r/527130

akosiaris mentioned this in rDEPLOYCHARTS590f152b70d6: restrouter: Add helmfile stanzas.Aug 2 2019, 8:53 AM

akosiaris mentioned this in rDEPLOYCHARTSbc22511d75ba: restrouter: Switch to event_service_uri.

Change 526632 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Add kubernetes stanzas

https://gerrit.wikimedia.org/r/526632

akosiaris added a parent task: T228676: Self-service Deployment Pipeline.Aug 5 2019, 12:53 PM

restrouter was temporarily deployed in the staging cluster today. Deployment was rolled back as it was failing, trying to reach out to restbase on port 7233, where restbase does not listen on yet. As soon as we figure out the exact details of the migration plan this should be ready to go. Those are

Restbase listening on port 7233 as well
Deciding the best plan on how to switchover the traffic (percentage based, per service based)

Ye, we first need to deploy https://gerrit.wikimedia.org/r/c/mediawiki/services/restbase/deploy/+/521572

thcipriani moved this task from Backlog to Migration on the Release Pipeline board.Aug 20 2019, 1:28 PM

Change 521572 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Expose both ports 7231 and 7233.

https://gerrit.wikimedia.org/r/521572

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:06:05Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@38c313d]: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:06:10Z] <mobrovac@deploy1001> deploy aborted: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953 (duration: 00m 04s)

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:06:26Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:09:48Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953 (duration: 03m 22s)

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:15:58Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - T223953

• mobrovac mentioned this in rGRBD38c313d7c1ac: Expose both ports 7231 and 7233..Aug 26 2019, 1:36 PM

Mentioned in SAL (#wikimedia-operations) [2019-08-26T13:38:57Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - T223953 (duration: 23m 00s)

Change 532382 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] RESTBase: Temporarily allow access to port 7233 as well

https://gerrit.wikimedia.org/r/532382

Change 534430 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] WIP: LVS: Setup port 7233 for restbase-backend

https://gerrit.wikimedia.org/r/534430

Change 532382 merged by Alexandros Kosiaris:
[operations/puppet@production] RESTBase: Temporarily allow access to port 7233 as well

https://gerrit.wikimedia.org/r/532382

ayounsi mentioned this in T232007: Restbase: significant increase of outbound dropped packets.Sep 4 2019, 4:34 PM

akosiaris updated the task description. (Show Details)Sep 17 2019, 9:36 AM

• mobrovac added a project: Platform Team Workboards (Clinic Duty Team).Sep 17 2019, 12:15 PM

• mobrovac moved this task from Later to Discussing on the Platform Team Workboards (Clinic Duty Team) board.

Going forward with Plan #1 (which I also find better)

Change 534430 merged by Alexandros Kosiaris:
[operations/puppet@production] LVS: Setup port 7233 for restbase-backend

https://gerrit.wikimedia.org/r/534430

• mobrovac moved this task from Discussing to Doing(WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.Sep 20 2019, 10:55 AM

Change 538238 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/deployment-charts@master] RESTRouter: Clean up the config && add the wikifeeds URI

https://gerrit.wikimedia.org/r/538238

Change 538238 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] RESTRouter: Clean up the config && add the wikifeeds URI

https://gerrit.wikimedia.org/r/538238

• mobrovac mentioned this in rDEPLOYCHARTSba4388959eb7: RESTRouter: Clean up the config && add the wikifeeds URI.Sep 20 2019, 11:04 AM

Change 538242 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Release restrouter chart version 0.0.3

https://gerrit.wikimedia.org/r/538242

Change 538242 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Release restrouter chart version 0.0.3

https://gerrit.wikimedia.org/r/538242

akosiaris mentioned this in rDEPLOYCHARTSc485c5ba3078: Release restrouter chart version 0.0.3.Sep 20 2019, 11:22 AM

Change 538288 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Expose the key_value buckets to production IPs

https://gerrit.wikimedia.org/r/538288

Change 538288 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Expose the key_value buckets to production IPs

https://gerrit.wikimedia.org/r/538288

Mentioned in SAL (#wikimedia-operations) [2019-09-24T10:29:22Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953

• mobrovac mentioned this in rGRBD19d0f4463a98: Expose the key_value buckets to production IPs.Sep 24 2019, 10:30 AM

Mentioned in SAL (#wikimedia-operations) [2019-09-24T10:51:41Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953 (duration: 22m 20s)

Change 538882 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/deployment-charts@master] RESTRouter: Add missing back-end svc URIs

https://gerrit.wikimedia.org/r/538882

Change 538882 merged by jenkins-bot:
[operations/deployment-charts@master] RESTRouter: Add missing back-end svc URIs

https://gerrit.wikimedia.org/r/538882

• mobrovac mentioned this in rDEPLOYCHARTSaf88b7f9d2b0: RESTRouter: Add missing back-end svc URIs.Sep 24 2019, 2:21 PM

Change 538894 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Skip probes for the first 60 seconds

https://gerrit.wikimedia.org/r/538894

Change 538894 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Skip probes for the first 60 seconds

https://gerrit.wikimedia.org/r/538894

akosiaris mentioned this in rDEPLOYCHARTS294664cdfdf6: restrouter: Skip probes for the first 60 seconds.Sep 24 2019, 2:25 PM

Change 538899 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Skip using https for mwapi_uri

https://gerrit.wikimedia.org/r/538899

Change 538899 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Skip using https for mwapi_uri

https://gerrit.wikimedia.org/r/538899

akosiaris mentioned this in rDEPLOYCHARTSf5c3a0b84a0a: restrouter: Skip using https for mwapi_uri.Sep 24 2019, 2:38 PM

Change 526448 merged by Alexandros Kosiaris:
[operations/dns@master] Assign restrouter LVS IPs

https://gerrit.wikimedia.org/r/526448

Change 521584 merged by Alexandros Kosiaris:
[operations/puppet@production] LVS for RESTRouter.

https://gerrit.wikimedia.org/r/521584

Change 539109 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Fix the parsoid port in the configuration

https://gerrit.wikimedia.org/r/539109

Change 539109 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] restrouter: Fix the parsoid port in the configuration

https://gerrit.wikimedia.org/r/539109

akosiaris mentioned this in rDEPLOYCHARTSf99f9f1323a6: restrouter: Fix the parsoid port in the configuration.Sep 25 2019, 12:28 PM

Change 539115 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] calico: Add port 8000 (parsoid) to restrouter

https://gerrit.wikimedia.org/r/539115

Change 539115 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] calico: Add port 8000 (parsoid) to restrouter

https://gerrit.wikimedia.org/r/539115

akosiaris mentioned this in rDEPLOYCHARTS1b299988657a: calico: Add port 8000 (parsoid) to restrouter.Sep 25 2019, 12:42 PM

Change 526449 merged by Alexandros Kosiaris:
[operations/dns@master] Activate restrouter discovery records

https://gerrit.wikimedia.org/r/526449

restrouter is up and running, LVS is setup and discovery records have been merged. I think the migration can start. A draft dashboard is present at https://grafana.wikimedia.org/d/ZA_JiypZk/restrouter, however restrouter differs enough from the rest of the other service-runner based services as far as the statsd emitted metrics goes, that I don't feel qualified to delve more into this. Feel free to amend it to your needs.

I 'll resolve this for now, we should try the migration into a different task.

Reopening as there are two more things we have to do before RESTRouter can be used:

decrease the service start-up time (currently at ~55s, which is too long for production use)
set up the rate-limiting DHT inside k8s for RESTRouter (this is currently disabled, and not having rate-limiting is not acceptable)

I am working on the former. For the latter, @akosiaris we'll have to get creative.

Change 539280 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/deployment-charts@master] RESTRouter: Skip resources on start-up and add nqo.wp.org

https://gerrit.wikimedia.org/r/539280

set up the rate-limiting DHT inside k8s for RESTRouter (this is currently disabled, and not having rate-limiting is not acceptable)

I think we are now in a position to actually do that, but I was wondering if we have numbers about how often we rate-limit clients in restbase.

Change 539280 merged by jenkins-bot:
[operations/deployment-charts@master] RESTRouter: Skip resources on start-up and add nqo.wp.org

https://gerrit.wikimedia.org/r/539280

• mobrovac mentioned this in rDEPLOYCHARTS55c66f5d6e81: RESTRouter: Skip resources on start-up and add nqo.wp.org.Sep 26 2019, 11:26 AM

@akosiaris regarding rate limiting, you mentioned a (semi-)permanent DNS entry. We can set that up, but the important bit is to have it always pointing to an active pod. That means that it has to be stable during transitions, i.e. deployments of new versions of RESTRouter. The way the rate-limiting DHT works is that a new process (node/pod) contacts an existing one and joins the network. There will be a bit of churn during deploy windows, but that is tolerable as long as new pods are contacting a pod that will stick around after the deploy. That obviously will not be the case for the first pod in a deployment, but that should be fine as long as the DNS can be switched easily during the deployment.

Having rate-limiting is really a crucial feature without which we cannot start using RESTRouter in production.

In T223953#5535332, @mobrovac wrote:

@akosiaris regarding rate limiting, you mentioned a (semi-)permanent DNS entry.

An automatically updated one that is local to the kubernetes cluster (and not really visible outside of it). We already have it for cxserver, e.g.

$ dig cxserver-production-kademlia.cxserver.svc.cluster.local
<snip>
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.213
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.149
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.24
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.151
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.19
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.129
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.239
cxserver-production-kademlia.cxserver.svc.cluster.local. 5 IN A	10.64.65.231

We can set that up, but the important bit is to have it always pointing to an active pod. That means that it has to be stable during transitions, i.e. deployments of new versions of RESTRouter. The way the rate-limiting DHT works is that a new process (node/pod) contacts an existing one and joins the network. There will be a bit of churn during deploy windows, but that is tolerable as long as new pods are contacting a pod that will stick around after the deploy. That obviously will not be the case for the first pod in a deployment, but that should be fine as long as the DNS can be switched easily during the deployment.

It will always be pointing to all active pods and it's up to the client library to pick whichever one it wants. During deployments, the DNS record will be updated as the deployment progresses in a rolling fashion removing old pods and adding new ones. Given the default 25% rate for a rolling deployment, at least 75% of pods will be under that record. There will be however no pod that "sticks" around after the deploy, but given the above I don't think it's necessary, right?

Having rate-limiting is really a crucial feature without which we cannot start using RESTRouter in production.

As I 've already said, we should be having graphs in grafana about such a crucial feature. It's great we already have logs (it will help us immensely in the migration), but stats are essential as well.

Change 540131 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Add ratelimiting support to chart

https://gerrit.wikimedia.org/r/540131

Change 540131 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Add ratelimiting support to chart

https://gerrit.wikimedia.org/r/540131

akosiaris mentioned this in rDEPLOYCHARTS57cbac754184: restrouter: Add ratelimiting support to chart.Oct 1 2019, 4:09 PM

Change 540365 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Revert the initialDelay seconds

https://gerrit.wikimedia.org/r/540365

Change 540365 merged by Mobrovac:
[operations/deployment-charts@master] restrouter: Revert the initialDelay seconds

https://gerrit.wikimedia.org/r/540365

• mobrovac mentioned this in rDEPLOYCHARTS15330b80fbc4: restrouter: Revert the initialDelay seconds.Oct 4 2019, 10:51 AM

Change 540841 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/deployment-charts@master] RESTRouter: Bump image tag to v1.1.2 and release v0.0.7

https://gerrit.wikimedia.org/r/540841

Change 540841 merged by jenkins-bot:
[operations/deployment-charts@master] RESTRouter: Bump image tag to v1.1.2 and release v0.0.7

https://gerrit.wikimedia.org/r/540841

• mobrovac mentioned this in rDEPLOYCHARTSaaf9e97d6f52: RESTRouter: Bump image tag to v1.1.2 and release v0.0.7.Oct 4 2019, 1:36 PM

The start-up time is now pretty good: around 3-5s per worker.

However, it seems that rate limiting is not working. I issued requests for restrouter.svc.eqiad.wmnet:7231/wikimedia.org/v1/metrics/pageviews/aggregate/en.wikipedia/all-access/all-agents/hourly/1970010100/1970010100 - a route that is limited to 100 req/s - but after issuing thousands of requests with varying concurrency no rate-limiting logs were produced.

• mobrovac mentioned this in T234816: Make internal services use RESTRouter instead of RESTBase.Oct 7 2019, 12:22 PM

Change 541278 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Kadelmia should listen on all IPs

https://gerrit.wikimedia.org/r/541278

Change 541278 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Kademlia should listen on all IPs

https://gerrit.wikimedia.org/r/541278

akosiaris mentioned this in rDEPLOYCHARTS29ce36f19ae7: restrouter: Kademlia should listen on all IPs.Oct 7 2019, 3:27 PM

Change 541771 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Allow the kademlia port in ingress

https://gerrit.wikimedia.org/r/541771

Change 541771 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Allow the kademlia port in ingress

https://gerrit.wikimedia.org/r/541771

akosiaris mentioned this in rDEPLOYCHARTSb5fa99f84507: restrouter: Allow the kademlia port in ingress.Oct 9 2019, 10:30 PM

akosiaris added a subtask: T235437: RESTBase/RESTRouter/service-runner rate limiting plans.Oct 14 2019, 2:39 PM

In the interest of splitting off from this task what is probably going to be somewhat of a discussion, I 've created subtask T235437 for the rate limiting functionality of RESTBase/RESTrouter.

WDoranWMF moved this task from Doing(WIP:5) to Backlog on the Platform Team Workboards (Clinic Duty Team) board.Dec 4 2019, 7:28 PM

akosiaris changed the task status from Open to Stalled.Dec 16 2019, 4:19 PM

• AMooney edited projects, added Platform Engineering (Icebox); removed Platform Team Workboards (Clinic Duty Team).Mar 13 2020, 1:33 PM

WDoranWMF closed this task as Declined.Mar 24 2020, 9:14 PM

WDoranWMF closed subtask T235437: RESTBase/RESTRouter/service-runner rate limiting plans as Declined.

Deploy the RESTBase front-end service (RESTRouter) to KubernetesClosed, DeclinedPublic0 Estimated Story PointsActions

Description

Service Info

Deployment Plans

Plan 1

Pros

Cons

Plan 2

Pros

Cons

Post migration

Details

Related ObjectsSearch...

Event Timeline

Deploy the RESTBase front-end service (RESTRouter) to Kubernetes
Closed, DeclinedPublic0 Estimated Story Points
Actions

Related Objects
Search...