Page MenuHomePhabricator

Strategy to slowly move Kartotherian's traffic from bare metal to k8s
Closed, ResolvedPublic

Description

Kartotherian is currently being blubberized and soon we should be able to start creating a new chart/deployment for it on Wikikube. Kartotherian will run on nodejs-20 and Bookworm, see T327396

The current pain points that I see are related to load balancing, more specifically:

  • We have two LVS services, kartotherian (plaintext, port 6533) and karotherian-ssl (TLS, port 443), but one LVS/discovery endpoint kartotherian.discovery.wmnet.
  • The maps.wikimedia.org domain points directly to the 443 port, using TLS.
  • On mapsXXXX we have nginx serving traffic for port 443, and kartotherian (nodejs) serving port 6533. Afaics the nginx config is mostly about TLS and performance, and it just proxies to port 6533.

The first question that I have is why do we need both, since ideally all clients should just use TLS. The second is more about what to do for the migration bare-metal -> k8s, since we'll not be able to use port 443 on k8s. This is what we did with Thumbor, when moving it from bare metal to Wikikube:

  • Deploy Thumbor on Wikikube, making it to listen on the same port as its bare metal cousin.
  • Add Wikikube workers behing the Thumbor LVS endpoint (initially depooled, to sit side-by-side with bare metal nodes).
  • Slowly enable some Wikikube workers to serve Thumbor prod traffic from K8s, and measure their issues/performances/etc..
  • Eventually leave only Wikikube workers pooled, and remove all bare metal hosts.

Due to the 443 port we cannot easily do this, so this is my idea:

  1. We add another listen 6543 to the Kartotherian's nginx config, so that the bare metal hosts will also serve TLS traffic from that port. It should be easy enough to do, but I need to verify that it works as expected.
  2. We create a new Puppet load balanced service called kartotherian-k8s-ssl. We use the same IP Addresses of the other kartotherian LVS services, just with a new port 6543 (port number just randomly picked up, not used in puppet's servive.yaml yet). In theory it shouldn't require any pybal config/restart, just updated settings in puppet for monitoring etc..
  3. Since we haven't created a new LVS IP etc.., the bare metal hosts should be pooled and ready to go.
  4. When we are comfortable, we move the ATS config (CDN) of maps.wikimedia.org to the new port.
  5. Then we deploy Kartotherian to K8s, with nodePort 6543 and 6533. When we are done, we should have happy pods running on Wikikube serving TLS traffic via Mesh (so nginx not needed at this point) from 6543 and plaintext traffic from 6533 (assuming that we'll still need it).
  6. At this point, we should be able to pool Wikikube workers in the Kartotherian LVS service. We should be able to add just a few of them, not the entire fleet, since kube-proxy should route the traffic for nodePorts 6543/6533 correctly to the Wikikube workers running the kartotherian pods.
  7. Once ready, we pool the first Wikikube worker and then we observe how well the k8s pods behave.
  8. Slowly over time, we pool all Wikikube workers and we depool gradually the bare metal nodes.
  9. Bare metal nodes not used anymore for prod traffic, undeploy kartotherian from them.

Not sure if I have missed anything important, please lemme know your thoughts!

Event Timeline

Change #1087421 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] tlsproxy::localssl: allow multiple listens for tls ports

https://gerrit.wikimedia.org/r/1087421

Change #1087422 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Change port for kartotherian-ssl

https://gerrit.wikimedia.org/r/1087422

Change #1087423 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::trafficserver::backend: move kartotherian to port 6543

https://gerrit.wikimedia.org/r/1087423

I had a chat with @Jgiannelos about the plan and it seems good, we just need to verify if the old plaintext lvs:ip combination is still in use. It will be really easy to do it when the Kartotherian Docker image will be deployed on k8s, since we'll see egress rules (the theory is that kartotherian may need to contact itself via plaintext).

Mentioned in SAL (#wikimedia-operations) [2024-11-06T10:43:07Z] <elukey> depool maps1005 to test an nginx config - T378944

jijiki triaged this task as Medium priority.Nov 7 2024, 11:48 AM
jijiki moved this task from Incoming 🐫 to serviceops-radar on the serviceops board.
jijiki edited projects, added serviceops-radar; removed serviceops.

Change #1087421 merged by Elukey:

[operations/puppet@production] tlsproxy::localssl: allow multiple listens for tls ports

https://gerrit.wikimedia.org/r/1087421

Change #1088319 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::maps::tlsproxy: allow traffic to port 6543

https://gerrit.wikimedia.org/r/1088319

Change #1088319 merged by Elukey:

[operations/puppet@production] profile::maps::tlsproxy: allow traffic to port 6543

https://gerrit.wikimedia.org/r/1088319

All the maps nodes are now serving traffic from port 6543 too. The next step is to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087422 to create the new lvs endpoint kartotherian.discovery.wmnet:6543, and after that we'll be able to switch maps.wikimedia.org to it.

This will allow us to be able to pool k8s workers in when ready, namely when kartotherian will be deployed on Wikikube.

Change #1087422 merged by Elukey:

[operations/puppet@production] Create new lvs service kartotherian-k8s-ssl

https://gerrit.wikimedia.org/r/1087422

Change #1089817 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Move kartotherian-k8s-ssl to lvs_setup

https://gerrit.wikimedia.org/r/1089817

Change #1089817 merged by Elukey:

[operations/puppet@production] Move kartotherian-k8s-ssl to lvs_setup

https://gerrit.wikimedia.org/r/1089817

The new kartotherian.discovery.wmnet:6543 endpoint is available.

Next steps:

  • https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087423 to move maps.wikimedia.org to kartotherian.discovery.wmnet:6543
  • Set kartotherian.discovery.wmnet:6543 to production in the LVS config
  • Change the profile::lvs::realserver::pools entry from kartotherian to kartotherian-k8s-ssl on maps nodes (NO-OP, but better for consistency).

We should be ready to test k8s workers as soon as the Docker image is ready!

Change #1090426 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Move kartotherian-k8s-ssl LVS endpoint to "production" state

https://gerrit.wikimedia.org/r/1090426

Change #1090426 merged by Elukey:

[operations/puppet@production] Move kartotherian-k8s-ssl LVS endpoint to "production" state

https://gerrit.wikimedia.org/r/1090426

Change #1087423 merged by Elukey:

[operations/puppet@production] profile::trafficserver::backend: move kartotherian to port 6543

https://gerrit.wikimedia.org/r/1087423

All action items done, now the next step is to wait for the k8s service to be deployed on Wikikube :)

Change #1097333 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::service_proxy::envoy: add tegola

https://gerrit.wikimedia.org/r/1097333

Today I found out that Kartotherian seems to be contacting the local postgres read replica to fetch geoshapes/osmdb data:

elukey@maps1005:~$ sudo netstat -tunap | grep 5432 | grep node
tcp        0      0 127.0.0.1:44002         127.0.0.1:5432          ESTABLISHED 1788/node           
tcp        0      0 127.0.0.1:54882         127.0.0.1:5432          ESTABLISHED 3345/node           
tcp        0      0 127.0.0.1:56962         127.0.0.1:5432          ESTABLISHED 30541/node          
tcp        0      0 127.0.0.1:54938         127.0.0.1:5432          ESTABLISHED 1434/node           
tcp        0      0 127.0.0.1:54940         127.0.0.1:5432          ESTABLISHED 1147/node           
tcp        0      0 127.0.0.1:56998         127.0.0.1:5432          ESTABLISHED 1130/node           
tcp        0      0 127.0.0.1:57014         127.0.0.1:5432          ESTABLISHED 2074/node           
tcp        0      0 127.0.0.1:54974         127.0.0.1:5432          ESTABLISHED 763/node            
tcp        0      0 127.0.0.1:54866         127.0.0.1:5432          ESTABLISHED 1588/node           
tcp        0      0 127.0.0.1:43914         127.0.0.1:5432          ESTABLISHED 1452/node           
tcp        0      0 127.0.0.1:56974         127.0.0.1:5432          ESTABLISHED 895/node            
tcp        0      0 127.0.0.1:56920         127.0.0.1:5432          ESTABLISHED 1713/node           
tcp        0      0 127.0.0.1:54936         127.0.0.1:5432          ESTABLISHED 2056/node           
tcp        0      0 127.0.0.1:54894         127.0.0.1:5432          ESTABLISHED 738/node            
tcp        0      0 127.0.0.1:54878         127.0.0.1:5432          ESTABLISHED 10330/node          
tcp        0      0 127.0.0.1:43980         127.0.0.1:5432          ESTABLISHED 1067/node           
tcp        0      0 127.0.0.1:54926         127.0.0.1:5432          ESTABLISHED 1731/node           
tcp        0      0 127.0.0.1:54986         127.0.0.1:5432          ESTABLISHED 881/node            
tcp        0      0 127.0.0.1:54954         127.0.0.1:5432          ESTABLISHED 1750/node           
tcp        0      0 127.0.0.1:54910         127.0.0.1:5432          ESTABLISHED 2040/node           
tcp        0      0 127.0.0.1:54966         127.0.0.1:5432          ESTABLISHED 1972/node           
tcp        0      0 127.0.0.1:56946         127.0.0.1:5432          ESTABLISHED 1416/node           
tcp        0      0 127.0.0.1:54930         127.0.0.1:5432          ESTABLISHED 865/node            
tcp        0      0 127.0.0.1:57018         127.0.0.1:5432          ESTABLISHED 1805/node           
tcp        0      0 127.0.0.1:56872         127.0.0.1:5432          ESTABLISHED 1083/node           
tcp        0      0 127.0.0.1:56990         127.0.0.1:5432          ESTABLISHED 1001/node           
tcp        0      0 127.0.0.1:43970         127.0.0.1:5432          ESTABLISHED 785/node            
tcp        0      0 127.0.0.1:56866         127.0.0.1:5432          ESTABLISHED 835/node            
tcp        0      0 127.0.0.1:43954         127.0.0.1:5432          ESTABLISHED 1999/node           

elukey@maps1005:~$ ps aux | grep [1]999
kartoth+  1999 39.1  0.5 1503804 775396 ?      Sl   Jul01 82862:31 /usr/bin/node /srv/deployment/kartotherian/deploy-cache/revs/483e8c3722435327559da0328fd604e00381ff5b/node_modules/service-runner/service-runner.js -c /etc/kartotherian/config.yaml

The current setup is easy and self-contained on the maps nodes, but something needs to change if we want to migrate to k8s. I'd say that we could think about an LVS service in front of the postgres read replicas for each DC, but I've never done it so I am not 100% sure if it is feasible or not.

For tegola we do the following:

# Temporarily we will use envoy as a L4 tcp proxy until envoy's
# Postgres proxy filter is production ready
# https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/postgres_proxy_filter#config-network-filters-postgres-proxy
tcp_services_proxy:
  maps_postgres:
    upstreams:
      # master node
      # - address: maps1009.eqiad.wmnet
      #   port: 5432
      # read replicas
      - address: maps1005.eqiad.wmnet
        port: 5432
      - address: maps1006.eqiad.wmnet
        port: 5432
      - address: maps1007.eqiad.wmnet
        port: 5432
      - address: maps1008.eqiad.wmnet
        port: 5432
      - address: maps1009.eqiad.wmnet
        port: 5432
      - address: maps1010.eqiad.wmnet
        port: 5432

Whatever solution we find, like adding an LVS endpoint, should also be applied to Tegola.

Change #1097333 merged by Elukey:

[operations/puppet@production] profile::service_proxy::envoy: add tegola

https://gerrit.wikimedia.org/r/1097333

To keep archives happy, we are going to use the envoy TCP proxy already implemented for Tegola with some tweaks. More info in T322647#10365816

I think we have made this migration slightly more complicated than it should be for the following two reasons:

Ingress
While kartotherian is a useful service, it is not a critical service, while its unavailability does not affect the stability of our websites. Thus, my suggestion is to simply use ingress for kartotherian-k8s, and flip the switch on ATS. For gradual rollout we can go our usual path of stopping puppet on cp* hosts, and enable it in batches. To my knowledge, no other internal service talks to maps.

In the above scenario, our only concern would be if the estimated capacity the service has on k8s, is the one it needs in reality.

Make Tegola a ClusterIP service
Kartotherian is tegola-vector-tiles' only client, if I am not mistaken. Now with both services on k8s, it appears that the service mesh and tegola's LVS, are adding a few extra hops which we can easily avoid. It is much simpler, in the end, to have a tegola-vector-tiles ClusterIP service

In the above scenario, we can add a tegola-vector-tiles clusterIP service, and switch to it following the successful migration of kartotherian. After that, we may remove both LVS and NodePort service. We will lose the ability of pooling/depooling tegola from a datacentre. In other words, if tegola on one datacenter is suffering, we will have to depool maps from that DC completely. Given how close to each other those to services are, I think it is ok (but it is my personal opinion)

I think we have made this migration slightly more complicated than it should be for the following two reasons:

Ingress
While kartotherian is a useful service, it is not a critical service, while its unavailability does not affect the stability of our websites. Thus, my suggestion is to simply use for kartotherian-k8s, and flip the switch on ATS. For gradual rollout we can go our usual path of stopping puppet on cp* hosts, and enable it in batches. To my knowledge, no other internal service talks to maps.

In the above scenario, our only concern would be if the estimated capacity the service has on k8s, is the one it needs in reality.

While I completely get your point, I totally disagree with the ingress strategy for the following reason:

  1. Short term: Even if we test the Kartotherian k8s endpoint, we'll understand only with live traffic if something is missing or not. This may end up flipping back and forth the ATS configuration to point to bare metals or k8s, until we find the correct config. Pooling in slowly k8s workers means that we can easily depool them if something is wrong, without any change at the CDN. And even if it is not marked as critical service, it may impact users and if we can avoid that we should :)
  2. Medium/Long term (say a couple of weeks or more): It is difficult to tune correctly memory/cpu requirements for pods, and it is especially difficult to get them right at first try. We are also upgrading the service to a new mapnik version, new nodejs version, etc.. so the risk of having to tune/fix configuration issues days after the rollout is concrete. I'd prefer to avoid touching the CDN's config for something like that, and just use confctl.
  3. We can flip to ingress as last step of the migration anyway, when we'll feel confident that the new setup works as expected.

Make Tegola a ClusterIP service
Kartotherian is tegola-vector-tiles' only client, if I am not mistaken. Now with both services on k8s, it appears that the service mesh and tegola's LVS, are adding a few extra hops which we can easily avoid. It is much simpler, in the end, to have a tegola-vector-tiles ClusterIP service

In the above scenario, we can add a tegola-vector-tiles clusterIP service, and switch to it following the successful migration of kartotherian. After that, we may remove both LVS and NodePort service. We will lose the ability of pooling/depooling tegola from a datacentre. In other words, if tegola on one datacenter is suffering, we will have to depool maps from that DC completely. Given how close to each other those to services are, I think it is ok (but it is my personal opinion)

This is a good idea and it can be evaluated after the migration, but since we are not 100% sure about few details I'd prefer to keep things as simple as possible for the move to k8s. Basically same idea as for Ingress - totally agree but I'd do it step-by-step to avoid us rushing for a fix if needed.

I think we have made this migration slightly more complicated than it should be for the following two reasons:

Ingress
While kartotherian is a useful service, it is not a critical service, while its unavailability does not affect the stability of our websites. Thus, my suggestion is to simply use for kartotherian-k8s, and flip the switch on ATS. For gradual rollout we can go our usual path of stopping puppet on cp* hosts, and enable it in batches. To my knowledge, no other internal service talks to maps.

In the above scenario, our only concern would be if the estimated capacity the service has on k8s, is the one it needs in reality.

While I completely get your point, I totally disagree with the ingress strategy for the following reason:

  1. Short term: Even if we test the Kartotherian k8s endpoint, we'll understand only with live traffic if something is missing or not. This may end up flipping back and forth the ATS configuration to point to bare metals or k8s, until we find the correct config. Pooling in slowly k8s workers means that we can easily depool them if something is wrong, without any change at the CDN. And even if it is not marked as critical service, it may impact users and if we can avoid that we should :)
  2. Medium/Long term (say a couple of weeks or more): It is difficult to tune correctly memory/cpu requirements for pods, and it is especially difficult to get them right at first try. We are also upgrading the service to a new mapnik version, new nodejs version, etc.. so the risk of having to tune/fix configuration issues days after the rollout is concrete. I'd prefer to avoid touching the CDN's config for something like that, and just use confctl.
  3. We can flip to ingress as last step of the migration anyway, when we'll feel confident that the new setup works as expected.

ack!

Make Tegola a ClusterIP service
Kartotherian is tegola-vector-tiles' only client, if I am not mistaken. Now with both services on k8s, it appears that the service mesh and tegola's LVS, are adding a few extra hops which we can easily avoid. It is much simpler, in the end, to have a tegola-vector-tiles ClusterIP service

In the above scenario, we can add a tegola-vector-tiles clusterIP service, and switch to it following the successful migration of kartotherian. After that, we may remove both LVS and NodePort service. We will lose the ability of pooling/depooling tegola from a datacentre. In other words, if tegola on one datacenter is suffering, we will have to depool maps from that DC completely. Given how close to each other those to services are, I think it is ok (but it is my personal opinion)

This is a good idea and it can be evaluated after the migration, but since we are not 100% sure about few details I'd prefer to keep things as simple as possible for the move to k8s. Basically same idea as for Ingress - totally agree but I'd do it step-by-step to avoid us rushing for a fix if needed.

ack!

elukey claimed this task.

The strategy has been created, we can keep going in the main task! Thanks all for the feedback!