⚓ T274388 New Service Request geoshapes

MSantos created this task.Feb 10 2021, 4:31 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 10 2021, 4:31 PM

MSantos added projects: Product-Infrastructure-Team-Backlog-Deprecated, Maps.Feb 10 2021, 4:32 PM

MoritzMuehlenhoff triaged this task as Medium priority.Feb 15 2021, 2:34 PM

Thanks for this task!

So I 've studied the diagrams a bit, they are helpful.

The deployment pipeline definitely supports nodejs (service-runner in fact) apps, as soon as the code is split in its own repo we can enable on it and get OCI (docker) images for it. After that we can just cooperate on the helm chart creation (I don't expect surprises there, we 've done this before)

Regarding the question:

How can this new service be moved to the k8s and still connect to the PostgreSQL DBs in maps clusters?

As long as the PostgreSQL DB is exposed in a tcp port on 1 of the nodes of the cluster, we can just connect to it, we do the same thing from mediawiki to MySQL dbs. However if we want High Availability, things become quickly more complicated as we will have to abstract away the current status quo which IIRC (correct me please if I am out of sync), there are N postgresql dbs, 1 read-write main and N-1 read-only replicas, with each node talking to the local DB. I don't think we currently have much expertise in this (postgres isn't really well supported in WMF), we will have to figure out what to do in this (mediawiki does this internally). Connection parameters (endpoint, port, user, db, password) can be supplied to the software via environmental variables or be specified in a config file, both are ok.

Note that there is 1 interesting connection in the diagrams that we will need to support specifically and that is talking to wdqs. We want to make sure we use the internal endpoint of the service (that is wdqs-internal.discovery.wmnet) for maintainability purposes (e.g. easy depooling of a DC) and separation of concerns.

Thanks for the Q4 timeline, it's pretty useful.

MSantos moved this task from Needs triage to Tracking on the Product-Infrastructure-Team-Backlog-Deprecated board.Feb 16 2021, 10:22 AM

jijiki added projects: serviceops, User-jijiki.Mar 4 2021, 4:44 PM

MSantos mentioned this in T276479: Create geoshapes component under the Maps project.Mar 4 2021, 4:57 PM

akosiaris moved this task from Inbox to Backlog on the Service-deployment-requests board.Mar 18 2021, 11:23 AM

Aklapper edited projects, added Maps (Geoshapes); removed Maps.Apr 1 2021, 3:36 PM

akosiaris moved this task from Backlog to In progress on the Service-deployment-requests board.Apr 7 2021, 11:33 AM

@akosiaris and @jijiki how can we move forward with this?

For context:

we have been successfully accessing the maps PostgreSQL database from tegola service in the k8s cluster.
The work to extract the geoshapes endpoint into a new service is basically done T274378 (we should reflect some recent changes)
Next steps would be creating the deployment-charts and benchmarks?

MSantos added subscribers: awight, SLopes-WMF.Feb 18 2022, 11:13 PM

In T274388#7722815, @MSantos wrote:

@akosiaris and @jijiki how can we move forward with this?

For context:

we have been successfully accessing the maps PostgreSQL database from tegola service in the k8s cluster.

The work to extract the geoshapes endpoint into a new service is basically done T274378 (we should reflect some recent changes)

Next steps would be creating the deployment-charts and benchmarks?

Hi @MSantos. @jijiki isn't available currently, but I 'll do my best to help.

Nice work on the extracting of the geoshapes into it's own repo and service, that well structured README is a breath of fresh air.

Next steps would indeed be (I am using indentation for dependencies here, some things can happen in parallel):

Enabling the deployment-pipeline to generate the OCI (docker) container
- Creating the helm chart itself in deployment-charts
  - Benchmark in a local env (if possible, don't go overboard we want coarse data, we 'll anyway have to finetune in real traffic)
Submit the helmfile.d/services stanzas for review and get them merged
Creation of k8s namespaces/token (SRE side, open up a task and we will get it done)
- Do the actual deployment
  - Set up LVS, DNS and discovery (that's strictly on SRE side)
    - Set up the traffic layer to send traffic to the service (if needed). This is a bit unclear to me currently. I am not sure from the diagrams whether the user's browser will need to talk to geoshapes (via the edge traffic layers) or kartotherian will talk to geoshapes, or both.
      - Acceptance tests
Set up grafana dashboards
Party?

Set up the traffic layer to send traffic to the service (if needed). This is a bit unclear to me currently. I am not sure from the diagrams whether the user's browser will need to talk to geoshapes (via the edge traffic layers) or kartotherian will talk to geoshapes, or both.

With the extraction, kartotherian and geoshapes shouldn't be related anymore. So we should set up the traffic layer to send geoshape's endpoint traffic to the new service

MSantos mentioned this in T302967: Create helm chart in deployment-charts for Geoshapes.Mar 3 2022, 11:33 AM

MSantos updated the task description. (Show Details)

In T274388#7744335, @MSantos wrote:

Set up the traffic layer to send traffic to the service (if needed). This is a bit unclear to me currently. I am not sure from the diagrams whether the user's browser will need to talk to geoshapes (via the edge traffic layers) or kartotherian will talk to geoshapes, or both.

With the extraction, kartotherian and geoshapes shouldn't be related anymore. So we should set up the traffic layer to send geoshape's endpoint traffic to the new service

So maps.wikimedia.org/geoshape specifically should be routed to the new service? Or can we create a new DNS for this, e.g. geoshapes.wikimedia.org? The former isn't so easy as the latter and might not even be desirable on Traffic's side, hence my question.

In T274388#7751113, @akosiaris wrote:

In T274388#7744335, @MSantos wrote:

Set up the traffic layer to send traffic to the service (if needed). This is a bit unclear to me currently. I am not sure from the diagrams whether the user's browser will need to talk to geoshapes (via the edge traffic layers) or kartotherian will talk to geoshapes, or both.

With the extraction, kartotherian and geoshapes shouldn't be related anymore. So we should set up the traffic layer to send geoshape's endpoint traffic to the new service

So maps.wikimedia.org/geoshape specifically should be routed to the new service? Or can we create a new DNS for this, e.g. geoshapes.wikimedia.org? The former isn't so easy as the latter and might not even be desirable on Traffic's side, hence my question.

We can go with the new DNS route for geoshapes, maybe it's even better on the application side for a proper switch with possibility to rollback.

In T274388#7752324, @MSantos wrote:

In T274388#7751113, @akosiaris wrote:

In T274388#7744335, @MSantos wrote:

Set up the traffic layer to send traffic to the service (if needed). This is a bit unclear to me currently. I am not sure from the diagrams whether the user's browser will need to talk to geoshapes (via the edge traffic layers) or kartotherian will talk to geoshapes, or both.

With the extraction, kartotherian and geoshapes shouldn't be related anymore. So we should set up the traffic layer to send geoshape's endpoint traffic to the new service

So maps.wikimedia.org/geoshape specifically should be routed to the new service? Or can we create a new DNS for this, e.g. geoshapes.wikimedia.org? The former isn't so easy as the latter and might not even be desirable on Traffic's side, hence my question.

We can go with the new DNS route for geoshapes, maybe it's even better on the application side for a proper switch with possibility to rollback.

Perfect! Thanks for accommodating to this! And yes, I have a similar opinion as you. I do expect that being able to change a config in the app to switch the hostname of the geoshapes endpoint will make the transition (and rollback if needed) easier and squarely on the hands of your team instead of needing cross team coordination in order to rollback edge cache configuration.

@akosiaris the initial geoshapes deployment-charts is created and ready to move forward: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/768678

I was able to benchmark it locally with Minikube, but the metrics seem to be bogus, I'll paste them here anyway and discuss how to move forward:

benchmark script: https://github.com/thesocialdev/mediawiki-services-profiler/blob/master/services/geoshapes.py
full html report:
report_1647431104.2665424.html767 KBDownload

Results:
For some reason, Minikube metrics-server was returning bogus metrics for CPU (usage equals to 0 even though it requests at least 1 core)

With that, I believe the memory benchmark isn't accurate as well, but here is the data collected: 812Mi

Screen Shot 2022-03-16 at 12.44.14 PM.png (502×1 px, 259 KB)

The created service pod though stabilized around ~74 req/s

total_requests_per_second_1647595314.png (350×1 px, 47 KB)

This is a good thing since the current production service req/s for both clusters is ~20 req/s.

Screen Shot 2022-03-18 at 10.23.34 AM.png (508×1 px, 212 KB)

Now, how can we proceed?

MSantos updated the task description. (Show Details)Mar 18 2022, 9:38 AM

Jgiannelos mentioned this in T275845: Cleanup deprecated codebase from kartotherian project.Mar 21 2022, 3:31 PM

jijiki moved this task from Incoming 🐫 to 🙈🙉🙊Backlog on the serviceops board.Sep 28 2022, 2:23 PM

awight added a project: WMDE-TechWish-Maintenance.Oct 5 2022, 10:41 AM

awight mentioned this in T323981: Document outstanding maintenance for the maps stack.Jan 6 2023, 12:50 PM

@MSantos, my current understanding is that we are pausing work on this. Should we set to Stalled ?

FWIW, there has been parallel work in T216826: Move Kartotherian to Kubernetes to containerize the whole kartotherian service, which currently includes the geoshapes code. A draft helm chart is ready for review, which could provide just geoshapes or tiles with minor tweaks.

akosiaris moved this task from In progress to Externally blocked on the Service-deployment-requests board.Mar 9 2023, 11:13 AM

New Service Request geoshapes
Open, MediumPublic
Actions

Description

Background

Acceptance Criteria

Related Objects

Event Timeline

	F35010874: Screen Shot 2022-03-18 at 10.23.34 AM.png
	Mar 18 2022, 9:38 AM

	F35010882: report_1647431104.2665424.html
	Mar 18 2022, 9:38 AM

	F35010871: total_requests_per_second_1647595314.png
	Mar 18 2022, 9:38 AM

	F35010867: Screen Shot 2022-03-16 at 12.44.14 PM.png
	Mar 18 2022, 9:38 AM

	F35010868: Screen Shot 2022-03-16 at 12.43.42 PM.png
	Mar 18 2022, 9:38 AM

New Service Request geoshapesOpen, MediumPublicActions